WO2021022903A1 - 数据处理方法、装置、计算机设备和存储介质 - Google Patents

数据处理方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021022903A1
WO2021022903A1 PCT/CN2020/095679 CN2020095679W WO2021022903A1 WO 2021022903 A1 WO2021022903 A1 WO 2021022903A1 CN 2020095679 W CN2020095679 W CN 2020095679W WO 2021022903 A1 WO2021022903 A1 WO 2021022903A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
iteration
quantization
bit width
quantized
Prior art date
Application number
PCT/CN2020/095679
Other languages
English (en)
French (fr)
Inventor
刘少礼
周诗怡
张曦珊
曾洪博
黄迪
张尧
Original Assignee
安徽寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910886905.6A external-priority patent/CN112085182A/zh
Priority claimed from CN201910888141.4A external-priority patent/CN112085150A/zh
Priority claimed from CN201910888599.XA external-priority patent/CN112085177A/zh
Application filed by 安徽寒武纪信息科技有限公司 filed Critical 安徽寒武纪信息科技有限公司
Publication of WO2021022903A1 publication Critical patent/WO2021022903A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of computer technology, in particular to a neural network data quantification method, device, computer equipment and storage medium.
  • Neural network is a mathematical model or calculation model that imitates the structure and function of biological neural networks. Through the training of sample data, the neural network continuously corrects the network weights and thresholds to make the error function drop in the direction of negative gradient and approach the expected output. It is a widely used recognition and classification model, which is mostly used for function approximation, model recognition and classification, data compression and time series forecasting. Neural networks are used in image recognition, speech recognition, natural language processing and other fields. However, as the complexity of neural networks increases, the amount of data and data dimensions are increasing, and the increasing amount of data is equal to The data processing efficiency of the device, the storage capacity of the storage device, and the memory access efficiency pose greater challenges.
  • a fixed bit width is used to quantify the operation data of the neural network, that is, the floating-point operation data is converted into the fixed-point operation data, so as to realize the compression of the operation data of the neural network.
  • the same quantization scheme is adopted for the entire neural network, but there may be large differences between different operation data of the neural network, which often leads to lower accuracy and affects the data operation result.
  • a neural network data quantification processing method applied to a processor, and the method includes:
  • the data to be quantized includes at least one of neurons, weights, and gradients of the neural network
  • the quantization parameters include point position parameters, scaling factors, and offsets.
  • a neural network data quantization processing device applied to a processor, and the device includes:
  • the data statistics module counts the data to be quantified according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and determines the statistical result of each data to be quantified;
  • the quantization parameter determination module determines the quantization parameter of each type of data to be quantized in the corresponding layer according to the statistical result and the data bit width;
  • a quantization processing module uses corresponding quantization parameters to quantize the data to be quantized
  • the data to be quantized includes at least one of neurons, weights, and gradients of the neural network
  • the quantization parameters include point position parameters, scaling factors, and offsets.
  • an artificial intelligence chip characterized in that the chip includes the above-mentioned neural network data quantization processing device.
  • an electronic device including the above artificial intelligence chip.
  • a board card comprising: a storage device, an interface device, a control device, and the above artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • a non-volatile computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the above-mentioned neural network data quantization processing method is realized.
  • the neural network data quantization processing method, device, computer equipment, and storage medium provided by the embodiments of the present disclosure count the data to be quantized according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and determine the value of each data to be quantized Statistical results: According to the statistical results and the data bit width, determine the quantization parameter of each type of data to be quantized in the corresponding layer; use the corresponding quantization parameter to quantize the data to be quantized.
  • the neural network data quantization processing method, device, computer equipment, and storage medium provided in the embodiments of the present disclosure use corresponding quantization parameters to quantize the quantized data, which reduces the storage space occupied by the stored data while ensuring accuracy . To ensure the accuracy and reliability of the calculation result, and can improve the efficiency of the calculation, and the quantization also reduces the size of the neural network model, and reduces the performance requirements for the terminal running the neural network model.
  • the present disclosure proposes a method, device and related products for adjusting the quantization parameters of the cyclic neural network, which can improve the quantization accuracy of the neural network and ensure the correctness and reliability of the calculation result.
  • the present disclosure provides a method for adjusting quantitative parameters of a recurrent neural network, the method including:
  • the quantization parameter of the cyclic neural network is used to implement the quantization operation of the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a quantization parameter adjustment device of a recurrent neural network, including a memory and a processor, the memory stores a computer program, and the processor implements the method described in any one of the above when the computer program is executed A step of. Specifically, when the processor executes the foregoing computer program, the following operations are implemented:
  • the quantization parameter of the cyclic neural network is used to implement the quantization operation of the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed, the method for adjusting the quantization parameter of the recurrent neural network described in any one of the above is realized A step of. Specifically, when the aforementioned computer program is executed, the following operations are implemented:
  • the quantization parameter of the cyclic neural network is used to implement the quantization operation of the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a quantitative parameter adjustment device of the cyclic neural network, the device includes:
  • the obtaining module is used to obtain the data change range of the data to be quantified
  • the iteration interval determination module is configured to determine a first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein the The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the quantization parameter adjustment method, device and related products of the cyclic neural network of the present disclosure obtain the data variation range of the data to be quantized, and determine the first target iteration interval according to the data variation range of the data to be quantized, so that the first target iteration interval can be determined according to the first target iteration interval.
  • a target iteration interval adjusts the quantization parameters of the cyclic neural network, so that the quantization parameters of the cyclic neural network in different operation stages can be determined according to the data distribution characteristics of the data to be quantified.
  • the method and device of the present disclosure can improve the accuracy in the quantization process of the recurrent neural network, thereby ensuring the accuracy of the calculation result Sex and reliability. Furthermore, the quantization efficiency can be improved by determining the target iteration interval.
  • a neural network quantization method For any layer to be quantized in the neural network, the method includes:
  • each of the data to be quantized is a subset of the target data, and the target data is any one of the layers to be quantized to be quantized Data to be calculated, where the data to be calculated includes at least one of input neurons, weights, biases, and gradients;
  • a neural network quantization device For any layer to be quantized in the neural network, the device includes:
  • the data determining module determines multiple data to be quantized in the target data of the layer to be quantized, each of the data to be quantized is a subset of the target data, and the target data is any one of the layer to be quantized
  • a data quantization module to quantize each of the data to be quantized according to the corresponding quantization parameter to obtain quantized data corresponding to each of the data to be quantized;
  • the data operation module obtains the quantization result of the target data according to the quantization data corresponding to each of the data to be quantized, so that the layer to be quantized performs operations according to the quantization result of the target data.
  • an artificial intelligence chip characterized in that the chip includes the above-mentioned neural network quantification device.
  • an electronic device including the above artificial intelligence chip.
  • a board card includes: a storage device, an interface device, a control device, and the above artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • a non-volatile computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the above neural network quantization method is realized.
  • the neural network quantization method, device, computer equipment, and storage medium include: determining a plurality of data to be quantized in the target data of the layer to be quantized, each of the data to be quantized A subset of the target data, the target data is any kind of data to be quantified in the layer to be quantized, and the data to be calculated includes at least one of input neurons, weights, biases, and gradients Quantize each of the data to be quantized according to the corresponding quantization parameters to obtain quantized data corresponding to each of the data to be quantized; obtain the quantization result of the target data according to the quantized data corresponding to each of the data to be quantized , So that the layer to be quantized performs operations according to the quantization result of the target data.
  • the neural network quantization method, device, computer equipment, and storage medium provided in the embodiments of the present disclosure use corresponding quantization parameters to quantize multiple data to be quantized in the target data, which reduces the storage data while ensuring accuracy.
  • the occupied storage space ensures the accuracy and reliability of the calculation results, and can improve the efficiency of the calculation, and the quantization also reduces the size of the neural network model, and reduces the performance requirements for the terminal running the neural network model.
  • Fig. 1 shows a schematic diagram of a processor according to an embodiment of the present disclosure.
  • Figure 2-1 shows a flowchart of a neural network data quantization processing method according to an embodiment of the present disclosure.
  • Fig. 2-2 shows a schematic diagram of a symmetric fixed-point number representation according to an embodiment of the present disclosure.
  • Figures 2-3 show a schematic diagram of a fixed-point number representation introducing an offset according to an embodiment of the present disclosure.
  • Figure 2-4a and Figure 2-4b are graphs of the variation range of the weight data of the neural network during the training process.
  • FIGS. 2-5 show a block diagram of a neural network data quantization processing device according to an embodiment of the present disclosure.
  • Figure 3-1 is a structural block diagram of the quantization parameter adjustment device 100'
  • Figure 3-2 shows a schematic diagram of the correspondence between data to be quantized and quantized data according to an embodiment of the present disclosure
  • Figure 3-3 shows a schematic diagram of the conversion of data to be quantized according to an embodiment of the present disclosure
  • 3-4 shows a flowchart of a method for adjusting a quantization parameter of a recurrent neural network according to an embodiment of the present disclosure
  • Figures 3-5a show a trend diagram of changes in data to be quantified in an operation process according to an embodiment of the present disclosure
  • Figures 3-5b show an expanded schematic diagram of a recurrent neural network according to an embodiment of the present disclosure
  • Figures 3-5c show a schematic diagram of the cycle of a recurrent neural network according to an embodiment of the present disclosure
  • FIG. 3-6 show a flowchart of a method for adjusting parameters of a recurrent neural network according to an embodiment of the present disclosure
  • 3-7 show a flowchart of a method for determining the variation range of a point position in an embodiment of the present disclosure
  • 3-8 show a flowchart of a method for determining a second average value in an embodiment of the present disclosure
  • 3-9 show a flowchart of a data bit width adjustment method in an embodiment of the present disclosure
  • 3-10 show a flowchart of a data bit width adjustment method in another embodiment of the present disclosure
  • 3-11 show a flowchart of a data bit width adjustment method in another embodiment of the present disclosure
  • 3-13 show a flowchart of a method for determining a second average value in another embodiment of the present disclosure
  • 3-14 show a flowchart of a quantization parameter adjustment method according to another embodiment of the present disclosure
  • 3-15 show a flowchart of adjusting quantization parameters in a quantization parameter adjustment method of an embodiment of the present disclosure
  • 3-16 show a flowchart of a method for determining a first target iteration interval in a parameter adjustment method of another embodiment of the present disclosure
  • 3-17 show a flowchart of a method for adjusting a quantization parameter according to still another embodiment of the present disclosure
  • 3-18 show a structural block diagram of a quantization parameter adjustment device according to an embodiment of the present disclosure
  • Figure 4-1 shows a flowchart of a neural network quantification method according to an embodiment of the present disclosure.
  • Fig. 4-2 shows a schematic diagram of determining data to be quantized according to an input neuron according to a convolution kernel according to an embodiment of the present disclosure.
  • Fig. 4-3 shows a schematic diagram of determining data to be quantized according to an input neuron according to a convolution kernel according to an embodiment of the present disclosure.
  • 4-4 shows a schematic diagram of a symmetric fixed-point number representation according to an embodiment of the present disclosure.
  • Figures 4-5 show schematic diagrams of a fixed-point number representation introducing an offset according to an embodiment of the present disclosure.
  • 4-6 show a flowchart of a neural network quantization method according to an embodiment of the present disclosure.
  • FIG. 4-7 show a block diagram of a neural network quantization device according to an embodiment of the present disclosure.
  • Fig. 5 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Fig. 1 shows a schematic diagram of a processor according to an embodiment of the present disclosure.
  • the processor 100 can execute the following methods.
  • the processor 100 includes multiple processing units 101 and a storage unit 102.
  • the multiple processing units 101 are used to execute instruction sequences, and the storage unit 102 is used to store data. Including random access memory (RAM, Random Access Memory) and register file.
  • RAM random access memory
  • the multiple processing units 101 in the processor 100 can share part of the storage space, for example, share part of the RAM storage space and the register file, and can also have their own storage space at the same time.
  • the neural network data quantization processing method, device, computer equipment, and storage medium provided by the embodiments of the present disclosure count the data to be quantized according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and determine the value of each data to be quantized Statistical results: Determine the quantization parameters of each type of data to be quantized in the corresponding layer according to the statistical results and the data bit width.
  • the size of the neural network model is reduced, and the performance requirements for the terminal running the neural network model are reduced, so that the neural network model can be applied to terminals such as mobile phones with relatively limited computing power, size, and power consumption.
  • the neural network data quantization processing method can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or it can be used to perform artificial intelligence operations.
  • Artificial Intelligence Processor IPU
  • Artificial intelligence operations may include machine learning operations, brain-like operations, etc. Among them, machine learning operations include neural network operations, k-means operations, and support vector machine operations.
  • the artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks executed by the processing unit.
  • Figure 2-1 shows a flowchart of a neural network data quantization processing method according to an embodiment of the present disclosure.
  • the method may include step S11 to step S13.
  • This method can be applied to the processor 100 shown in FIG. 1.
  • the processing unit 101 is configured to execute steps S11 to S13.
  • the storage unit 102 is used to store data related to the processing from step S11 to step S13, such as the data to be quantized, statistical results, quantization parameters, and data bit width.
  • step S11 the data to be quantized is counted according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and the statistical result of each type of data to be quantized is determined.
  • the data to be quantified includes at least one of neurons, weights, and gradients of the neural network.
  • step S12 the quantization parameter of each type of data to be quantized in the corresponding layer is determined according to the statistical result and the data bit width.
  • the quantization parameters include point position parameters, scaling factors, and offsets.
  • step S13 the data to be quantized is quantized using the corresponding quantization parameter.
  • the corresponding layer in the neural network operation process can be a convolutional layer, a fully connected layer, a pooling layer, etc., which are involved in the neural network operation.
  • the layer that performs operations or processing is not limited in this disclosure.
  • the data to be quantized is data expressed in a high-precision data format, and the quantized data is expressed in a low-precision data format.
  • the accuracy of the data format of the data to be quantized is higher than that of the quantized data.
  • corresponding quantization methods can be used for quantization according to the number of channels in the corresponding layers.
  • the quantization methods include the following methods 1 and 2.
  • the quantization parameter and data bit width corresponding to each type of data to be quantized in the corresponding layer are the same.
  • the quantization parameter of each data to be quantized is determined when the channel of the corresponding layer is a single channel or the corresponding layer has no channel according to the statistical result and the data bit width.
  • the scaling factors and offsets of the weights in the same channel of the corresponding layer are the same, and the weights in all channels in the corresponding layer are the same.
  • the point position parameters and data bit widths corresponding to the values are the same, the quantization parameters and data bit widths corresponding to neurons in all channels of the corresponding layer are the same, and the quantization parameters and data bit widths corresponding to the gradients in all channels of the corresponding layer are the same.
  • the neurons in the corresponding layer can be counted, and the statistical results of the neurons for the corresponding layer can be determined respectively, and then according to the statistics of the neurons for the corresponding layer
  • the result and data bit width determine the quantization parameters corresponding to neurons in all channels of the corresponding layer.
  • statistics can be performed on the gradient in the corresponding layer, and the statistical results of the gradient for the corresponding layer can be determined respectively, and then the corresponding layer can be determined according to the statistical results of the gradient for the corresponding layer and the data bit width
  • the quantization parameters corresponding to the gradients in all channels are multi-channel
  • the channels of the corresponding layer are multi-channel
  • statistics can be performed on the weights in each channel of the corresponding layer to obtain the first statistical result of the weight in each channel, and according to the first statistical result
  • the data bit width determines the scaling factor and offset of the weight in each channel of the corresponding layer.
  • the neural network operation process may include at least one operation of neural network training, neural network inference, and neural network fine-tuning.
  • training of a neural network refers to a process of performing multiple iterations on a neural network (the weight of the neural network may be a random number), so that the weight of the neural network can meet a preset condition.
  • the training of neural network includes forward processing and back propagation gradient.
  • forward processing a neural network operation is performed according to the input data to obtain the operation result.
  • back propagation gradient process the error value is determined according to the forward output result of the forward processing and the predicted output result, and the weight gradient and/or the input data gradient is determined according to the error value.
  • the derivative of the error value is the gradient.
  • the training process of the neural network is as follows: the processor can use a neural network with a weight value of a random number to perform forward processing on the input data to obtain a forward processing result. The processor then determines an error value according to the forward processing result and the preset reference value, and determines the weight gradient and/or the input data gradient according to the error value. Finally, the processor can update the gradient of the neural network according to the weight gradient, obtain a new weight, and complete an iterative operation. The processor executes multiple iterations in a loop until the forward processing result of the neural network meets the preset condition.
  • Neural network fine-tuning refers to the process of performing multiple iterative operations on the neural network (the weight of the neural network is already in a convergent state rather than a random number), so that the accuracy of the neural network can meet the preset requirements.
  • This fine-tuning process is basically the same as the above-mentioned training process, and can be regarded as a process of retraining the neural network in a convergent state.
  • Neural network inference refers to the process of using a neural network whose weights meet preset conditions to perform forward processing to realize functions such as recognition or classification, such as the use of neural network for image recognition and so on.
  • the weights in the neural network are updated once using the gradient. This is called an iteration (iteration ).
  • iteration In order to obtain a neural network whose accuracy meets expectations, a very large sample data set is required during the training process. In this case, it is impossible to input the sample data set into the computer at once. Therefore, in order to solve this problem, the sample data set needs to be divided into multiple blocks, and each block is passed to the computer. After each block of the data set is processed forward, the weight of the neural network is updated correspondingly.
  • a neural network including 5 convolutional layers and 3 fully connected layers as an example.
  • each neuron In the convolutional layer and the fully connected layer, the neurons in each layer are quantified separately, and all the neurons in each layer have the same point position parameter, scaling factor, and offset.
  • Each neuron has a corresponding data bit width, which can be a preset value or a value adjusted according to the quantization error corresponding to the data bit width.
  • Method 1 For the weights in the fully connected layer, “Method 1" can be used to quantify the weights of each corresponding layer. All weights in each layer have the same point position parameter, scaling factor, and offset.
  • the weights in the convolutional layer can be quantized by the above-mentioned "method two", that is, the scaling coefficients and offsets of the weights in the same channel of the corresponding layer are the same, and the weights in all channels in the corresponding layer correspond to The point position parameter. It is also possible to use "Method 1" to quantify the weights of each corresponding layer, and all weights in each layer have the same point position parameter, scaling factor, and offset. Each weight has a corresponding data bit width, which can be a preset value or a value adjusted according to the quantization error corresponding to the data bit width.
  • the weights in the convolutional layer are quantized with "Method 1" with relatively low accuracy, and the calculation speed is relatively high.
  • each layer In the convolutional layer and the fully connected layer, the gradients in each layer are quantized separately, and all the gradients in each layer have the same point position parameter, scaling factor, and offset.
  • Each gradient has a corresponding data bit width, which can be a preset value or a value adjusted according to the quantization error corresponding to the data bit width.
  • the statistical result may include any one of the following: the maximum absolute value of each type of data to be quantized, and one half of the distance between the maximum value and the minimum value of each type of data to be quantized.
  • the maximum absolute value is the absolute value of the maximum or minimum value in each type of data to be quantized.
  • the statistical result when the corresponding layer has no channel (such as a fully connected layer) or the corresponding layer is a single channel, the statistical result can be the maximum absolute value of each type of data to be quantized in the corresponding layer, or the maximum and minimum values. One-half of the value of the distance.
  • the statistical result can be the absolute maximum value of each kind of data to be quantized in the corresponding layer or one half of the distance between the maximum value and the minimum value, and it can also include each kind of data to be quantized. The maximum absolute value of the data in the different channels of the corresponding layer, or one half of the distance between the maximum and the minimum.
  • the maximum absolute value of each type of data to be quantized in the corresponding layer or a certain channel of the corresponding layer can be confirmed by the maximum and minimum values of each type of data to be quantized.
  • the maximum and minimum values corresponding to the data to be quantized in the corresponding layer will be saved under normal circumstances, and the absolute maximum value can be obtained directly based on the maximum and minimum values corresponding to the saved data to be quantized, without consuming more The resources to calculate the absolute value of the quantitative data to save the time to determine the statistical results.
  • the scaling factor may be determined according to the point position parameter, statistical result, and data bit width.
  • the offset is determined according to the statistical results of each type of data to be quantized.
  • the data bit width may be a preset value.
  • the preset value of the data bit width may be 8 bits.
  • the quantized data can be quantized based on the following formula (1-1) and formula (1-2) to obtain quantized data I x .
  • S is a point position parameter, taking an integer, which is related to the position of a fixed-point number.
  • f is the scaling factor, taking a rational number and f ⁇ (0.5,1].
  • O is the offset, taking a rational number.
  • round represents the rounding operation of rounding. round can also be replaced by other rounding operations, for example, rounding up, Rounding operations such as rounding down and rounding to zero are not limited in this disclosure.
  • 8bit quantization can be performed, that is, n is 8, and the value range of I x is [-128,127].
  • n-bit fixed-point number can represent the maximum value of floating-point number A is 2 s (2 n-1 -1), then n-bit fixed-point number can represent the maximum value in the number field of the data to be quantized as 2 s (2 n -1 -1), n-bit fixed-point number can indicate that the minimum value in the number field of the data to be quantized is -2 s (2 n-1 -1). It can be seen from formula (1-1) that when the quantization parameter corresponding to the first case is used to quantize the data to be quantized, the quantization interval is 2 s ⁇ f, and the quantization interval is denoted as C.
  • the n-bit fixed-point number can indicate that the maximum value in the number field of the data to be quantized is (2 n-1 -1) ⁇ 2 s ⁇ f+O
  • the n-bit fixed-point number can indicate that the minimum value in the number field of the data to be quantized is -(2 n-1 -1) ⁇ 2 s ⁇ f+O.
  • the quantized n-bit binary representation value I x of data x can be inversely quantized according to formula (1-4) to obtain inverse quantized data Among them, dequantized data
  • the data format of is the same as the data format of the corresponding data F x before quantization, and both are floating point values.
  • the channel of the corresponding layer is a single channel or the corresponding layer has no channel
  • f (c) is the scaling factor of the c-th channel of the corresponding layer
  • Z (c) is the statistical result of the c-th channel of the corresponding layer.
  • FIG. 2-2 shows a schematic diagram of a symmetric fixed-point number representation according to an embodiment of the present disclosure.
  • the number field of the data to be quantized is distributed with "0" as the symmetric center.
  • Z is the maximum absolute value of all floating-point numbers in the number field of the data to be quantized.
  • A is the maximum value of the floating-point number that can be represented by an n-bit fixed-point number.
  • the conversion of a floating-point number A to a fixed-point number is 2 n -1 -1. To avoid overflow, A needs to include Z.
  • FIGs 2-3 show a schematic diagram of a fixed-point number representation introducing an offset according to an embodiment of the present disclosure. As shown in Figure 2-3.
  • the number field of the data to be quantized is not distributed symmetrically with "0" as the center.
  • Z min is the minimum value of all floating-point numbers in the number field of the data to be quantized
  • Z max is the maximum value of all floating-point numbers in the number field of the data to be quantized.
  • P is the center point between Z min and Z max .
  • the number field of the data to be quantized is shifted as a whole, so that the number field of the data to be quantized after translation is distributed with "0" as the symmetric center.
  • the maximum absolute value in the number field is Z. It can be seen from Figure 2-3 that the offset is the horizontal distance from point “0" to point "P”, and this distance is called offset O. among them,
  • the point position parameter and the scaling factor are both related to the data bit width. Different data bit widths result in different point position parameters and scaling factors, thereby affecting the quantization accuracy.
  • Quantization is the process of converting high-precision numbers expressed in the past with 32bit or 64bit into fixed-point numbers that take up less memory space. The process of converting high-precision numbers into fixed-point numbers will cause a certain loss in accuracy. In the process of training or fine-tuning, within a certain range of iterations, using the same data bit width quantization has little effect on the overall accuracy of neural network operations. After a certain number of iterations, the same data bit width quantization cannot meet the accuracy requirements of training or fine-tuning.
  • the data bit width n can be adjusted along with the training or fine-tuning process.
  • the data bit width n can be artificially set to a preset value. In the range of different iteration times, call the corresponding data bit width n set in advance.
  • the method may further include: adjusting the data bit width according to the quantization error corresponding to the data bit width, so as to determine the quantization parameter using the adjusted data bit width.
  • the quantization error is determined according to the quantized data in the corresponding layer and the corresponding data before quantization.
  • adjusting the data bit width according to the quantization error corresponding to the data bit width may include: comparing the quantization error with a threshold value, and adjusting the data bit width according to the comparison result.
  • the threshold may include at least one of the first threshold and the second threshold. The first threshold is greater than the second threshold.
  • the quantization error is compared with the threshold, and the data bits are adjusted according to the comparison result, which may include any of the following:
  • the data bit width remains unchanged.
  • the first threshold and the second threshold may be empirical values, or may be variable hyperparameters. Conventional optimization methods for hyperparameters are suitable for both the first threshold and the second threshold, and the hyperparameter optimization scheme will not be repeated here.
  • the data bit width can be adjusted according to a fixed number of bits, or according to the difference between the quantization error and the error threshold, the data bit width can be adjusted according to the variable adjustment step.
  • the actual needs of the neural network operation process adjust the data bit width longer or shorter.
  • the data bit width n of the current convolutional layer is 16, and the data bit width n is adjusted to 12 according to the quantization error. That is to say, in practical applications, the data bit width n is 12 instead of 16 to meet the accuracy requirements in the neural network operation process, so that the fixed-point operation speed can be greatly increased within the accuracy allowable range. Thereby improving the resource utilization rate of the artificial intelligence processor chip.
  • the method may further include: dequantizing the quantized data to obtain the dequantized data, wherein the data format of the dequantized data is the same as the data format of the corresponding data before quantization; The quantized data and the corresponding dequantized data determine the quantization error.
  • the data before quantization may be data to be quantized.
  • the processor may calculate the quantization error according to the data to be quantized and the corresponding inverse quantization data.
  • the processor may determine an error term according to the to-be-quantized data Z and its corresponding inverted quantized data Z (n) , and determine the quantization error according to the error term.
  • the processor may respectively calculate the difference between the data to be quantized Z and the corresponding inverse quantization data Z (n) , obtain m difference values, and use the sum of the m difference values as the error term . After that, the processor can determine the quantization error according to the error term.
  • the specific quantization error can be determined according to the following formula:
  • i is the subscript of the i-th data to be quantized in the data set to be quantized. i is an integer greater than or equal to 1 and less than or equal to m.
  • Figure 2-4a and Figure 2-4b are graphs of the variation range of the weight data of the neural network during the training process.
  • the abscissa represents the number of iterations
  • the ordinate represents the maximum value of the weight after taking the logarithm.
  • the weight data variation amplitude curve shown in Figure 2-4a shows the weight data variation corresponding to different iterations of any convolutional layer of the neural network in the same period (epoch).
  • the conv0 layer corresponds to the weight data change range curve A
  • the conv1 layer corresponds to the weight data change range curve B
  • the conv2 layer corresponds to the weight data change range curve C
  • the conv3 layer corresponds to the weight data change range curve D
  • Conv4 layer corresponds to the weight data variation range curve e.
  • the weight data of the corresponding layer of each generation has similarity within a certain iteration interval.
  • the data bit width used in the quantization of the corresponding layer in the previous iteration can be used.
  • the bit width quantizes the weight data of the corresponding layer of the current generation, or quantizes the weight data of the current layer based on the preset data bit width n of the current layer to obtain the quantized fixed-point number. Determine the quantization error according to the quantized weight data and the corresponding weight data before quantization.
  • the data bit width used when quantizing the corresponding layer of the previous generation or the preset data bit of the current layer The width is adjusted, and the adjusted data bit width is applied to the quantization of the weight data of the corresponding layer of the current generation.
  • the weight data between each layer of the neural network is independent of each other and does not have similarity. Because the weight data does not have similarity, the neuron data between each layer is also independent of each other and does not have similarity. Therefore, in the neural network training or fine-tuning process, the data bit width of each layer in each iteration of the neural network is applied to the corresponding layer.
  • the weight data as an example, in the process of neural network training or fine-tuning, the data bit widths corresponding to the neuron data and the gradient data are also the same, which will not be repeated here.
  • the data before quantization is the data to be quantized involved in the weight update iteration process within the target iteration interval.
  • the target iteration interval may include at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval.
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified involved in the iterative process of updating the weight value at the predicted time point.
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified and the change trend value of the data bit width involved in the weight update iteration process at the predicted time point.
  • the pre-determined time point is the time point used to determine whether the data bit width needs to be adjusted, and the pre-determined time point corresponds to the time point when the weight update iteration is completed.
  • the step of determining the target iteration interval may include:
  • the change trend value of the point position parameter is based on the weight value corresponding to the current prediction time point.
  • the moving average value of the point position parameter during the iteration process corresponds to the previous prediction time point.
  • the weight value of the point position parameter in the iterative process is determined by the moving average, or according to the point position parameter in the weight value iteration process corresponding to the current predictive time point, and the point in the weight value iteration process corresponding to the last predictive time point
  • the moving average of the position parameters is determined.
  • the expression of formula (1-6) is:
  • diff update1
  • M is the moving average of the point position parameter s increasing with training iterations.
  • M (t) is the moving average of the point position parameter s corresponding to the t-th predictive time point increasing with the training iteration
  • M (t) is obtained according to formula (1-7 ) .
  • s (t) is the point position parameter s corresponding to the t-th predictive time point.
  • M (t-1) is the sliding average value of the point position parameter s corresponding to the t-1th predictive time point
  • is the hyperparameter.
  • diff update1 measures the change trend of the point position parameter s, because the change of the point position parameter s is also reflected in disguised form in the change of the maximum value Z max in the current data to be quantified. The larger the diff update1 is, it indicates that the value range changes drastically, and an update frequency with a shorter interval is required, that is, the target iteration interval is smaller.
  • the target iteration interval is determined according to formula (1-8).
  • the same data bit width is used in the quantization process within the same target iteration interval, and the data bit width used in the quantization process within different target iteration intervals may be the same or different.
  • I is the target iteration interval.
  • ⁇ and ⁇ are hyperparameters.
  • diff update1 is the change trend value of the point position parameter.
  • the predictive time point includes the first predictive time point, and the first predictive time point is determined according to the target iteration interval. Specifically, at the t-th pre-judgment time point in the training or fine-tuning process, the weight data of the corresponding layer of the current generation is quantized using the data bit width used in the quantization of the corresponding layer of the previous generation to obtain the quantized fixed point number, Determine the quantization error according to the weight data before quantization and the corresponding weight data before quantization. The quantization error is respectively compared with the first threshold and the second threshold, and the comparison result is used to determine whether to adjust the data bit width used in the quantization of the corresponding layer of the previous generation.
  • the t-th first prediction time point corresponds to the 100th generation
  • the data bit width used by the 99th generation is n 1 .
  • the quantization error is confirmed according to the data bit width n 1
  • the quantization error is compared with the first threshold and the second threshold to obtain the comparison result. If it is confirmed according to the comparison result that the data bit width n 1 does not need to be changed, use equation (1-8) to confirm that the target iteration interval is 8 generations.
  • the 100th generation is the initial iteration within the current target iteration interval
  • the 100th to 107th generations Generation as the current target iteration interval.
  • the 101st to 108th generations are used as the current target iteration interval.
  • each generation still uses the data bit width n 1 used in the previous target iteration interval.
  • the data bit width used in quantization between different target iteration intervals can be the same. If the 100th to 107th generations are used as the current target iteration interval, then the 108th generation in the next target iteration interval is regarded as the t+1 first predictive time point, and if the 101st to 108th generations are regarded as the current Then the 108th generation in the current target iteration interval is regarded as the t+1 first predictive time point.
  • the quantization error according to an acknowledgment data bit n, the quantization error with a first threshold value, the second threshold value to obtain a comparison result.
  • the comparison result it is determined that the data bit width n 1 needs to be changed to n 2 , and the target iteration interval is confirmed to be 55 generations using formula (1-8).
  • the 108th to 163rd generations or the 109th to 163rd generations are used as the target iteration interval, and each generation uses the data bit width n 2 during quantization within the target iteration interval.
  • the data bit width used in quantization can be different between different target iteration intervals.
  • formula (1-6) is applicable to obtain the change trend value of the point position parameter. If the first predictive time point at the current moment is the initial iteration of the current target iteration interval, then in formula (1-6), M (t) is the point corresponding to the time point corresponding to the initial iteration of the current target iteration interval The moving average of the position parameter s increasing with the training iteration, s (t) is the point position parameter s corresponding to the time point corresponding to the initial iteration of the current target iteration interval, M (t-1) is the last target iteration interval The moving average of the point position parameter s corresponding to the time point corresponding to the initial iteration increases with the training iteration.
  • M (t) is the point corresponding to the time point corresponding to the last iteration of the current target iteration interval
  • the moving average of the position parameter s increasing with the training iteration s (t) is the point position parameter s corresponding to the last iteration of the current target iteration interval
  • M (t-1) is the last target iteration interval
  • the moving average of the point position parameter s corresponding to the time point corresponding to the last iteration increases with the training iteration.
  • the prediction time point may further include a second prediction time point.
  • the second predictive time point is determined according to the data variation range curve. Based on the data fluctuation range of the big data in the neural network training process, the data fluctuation range curve shown in Figure 2-4a is obtained.
  • the data variation range curve shown in Figure 2-4a shows that from the start of training to the iteration interval of the Tth generation, each time the weight is updated, the data variation range is very large.
  • the current generation first uses the data bit width n 1 of the previous generation to quantize, and the obtained quantization result determines the corresponding quantization error with the corresponding data before quantization.
  • the quantization error is respectively compared with the first threshold and
  • the second threshold is compared, and the data bit width n 1 is adjusted according to the comparison result to obtain the data bit width n 2 .
  • the data bit width n 2 is used to quantify the weight data to be quantized related to the current generation.
  • the target iteration interval determines the first predictive time point.
  • determine whether to adjust the data bit width and how to adjust it, and determine the following according to formula (1-8) A target iteration interval to obtain the next first predictive time point. Since training is started to the iteration interval of the T-th generation, the weight data before and after each iteration has a very large change, so that the weight data of the corresponding layer of each generation is not similar. In order to meet the accuracy problem, the current The data of each layer of the previous generation cannot continue to use the corresponding quantization parameters of the corresponding layer of the previous generation. The data bit width can be adjusted from generation to generation in the previous T generation.
  • the target iteration interval of the previous T generation can be preset in advance according to the law revealed by the data variation curve shown in Figure 2-4a, namely: according to the data variation curve
  • the target iteration interval of the previous T generation is directly preset, and the time point when the corresponding weight update iteration of each generation of the previous T generation is completed without using formula (1-8) is confirmed as the second predictive time point. This makes the resources of the artificial intelligence processor chip more reasonable.
  • the data variation curve shown in Figure 2-4a has little variation from the T generation.
  • the quantization parameters are reconfirmed without generations.
  • T or T+1 generation use the current generation to correspond
  • the data before quantization and the data after quantization determine the quantization error.
  • the quantization error determine whether the data bit width needs to be adjusted and how to adjust, and also determine the target iteration interval according to formula (1-8). If the confirmed target iteration interval is 55 generations, this requires the time point corresponding to the 55th generation from the Tth generation or after T+1 as the first predictive time point to determine whether to adjust the data bit width and how to adjust it, and according to Formula (1-8) determines the next target iteration interval, thereby determining the next first predictive time point, until all algebraic operations in the same period (epoch) are completed. On this basis, after each epoch, adaptive adjustments are made to the data bit width or quantization parameters, and finally the quantized data is used to obtain a neural network whose accuracy meets the expectations.
  • the value of T is determined to be 130 according to the weight data variation range curve shown in Fig. 2-4a (this value does not correspond to Fig. 2-4a, for the convenience of description, it is only assumed that T is valued) Is 130, not limited to the hypothetical value.)
  • the 130th generation in the training process is used as the second predictive time point, and the current first predictive time point is the 100th generation in the training process.
  • the Formula (1-8) determines that the target iteration interval is 35 generations. In the target iteration interval, train to the 130th generation and reach the second predictive time point.
  • the target iteration interval determined in this case is 42 generations. From the 130th generation to the 172nd generation as the target iteration interval, the 135th generation corresponding to the first predictive time point determined when the target iteration interval is 35 generations is within the target iteration interval of 42 generations. In the 135th generation, It can be judged according to formula (1-8) whether to adjust the data bit width and how to adjust it.
  • the second prediction time point is preset in advance according to the data variation curve.
  • the preset second prediction time At this point, the data bit width is directly adjusted according to the quantization error, and the adjusted data bit width is used to quantize the data to be quantized in the current generation.
  • the middle and late stages of training or fine-tuning obtain the target iteration interval according to formula (1-8) to determine the corresponding first predictive time point, and determine whether to adjust the data bit width and how to adjust at each first predictive time point . In this way, while satisfying the accuracy of floating-point operations required by neural network operations, the resources of the artificial intelligence processor chip are reasonably used, which greatly improves the efficiency of quantization.
  • the step of determining the target iteration interval may include:
  • the predicted time point determine the change trend value of the position parameter of the corresponding point of the data to be quantified and the change trend value of the data bit width during the weight iteration process; among them, the predicted time point is used to determine whether the data bit width needs to be adjusted
  • the predicted time point corresponds to the time point when the weight update iteration is completed;
  • the corresponding quantization error can be used to determine the change trend value of the data bit width according to equation (1-9).
  • is the hyperparameter
  • diff bit is the quantization error
  • diff update2 is the change trend value of the data bit width.
  • diff update2 measures the changing trend of the data bit width n used in quantization. The larger the diff update2 , the more likely it is to update the fixed-point bit width, and a shorter update frequency is required.
  • the change trend value of the point position parameter can still be obtained according to formula (1-6), and M (t) in formula (1-6) can be obtained according to formula (1-7).
  • diff update1 measures the change trend of the point position parameter s, because the change of the point position parameter s is also reflected in disguised form in the change of the maximum value Z max in the current data to be quantified. The larger the diff update1 is, it indicates that the value range changes drastically, and an update frequency with a shorter interval is required, that is, the target iteration interval is smaller.
  • the target iteration interval is determined according to equation (1-10).
  • the same data bit width is used in the quantization process within the same target iteration interval, and the data bit width used in the quantization process within different target iteration intervals may be the same or different.
  • I is the target iteration interval.
  • ⁇ and ⁇ are hyperparameters.
  • diff update1 is the change trend value of the point position parameter.
  • diff update2 is the change trend value of the data bit width.
  • diff update1 is used to measure the change of the point position parameter s, but the change of the point position parameter s caused by the change of the data bit width n should be ignored. Because this has already reflected the change of data bit width n in diff update2 . If this neglected operation is not done in diff update1 , then the target iteration interval I determined according to formula (1-10) is inaccurate, resulting in too many first prediction time points, and it is easy to be frequent during training or fine-tuning. The operation of whether to update the data bit width n and how to update, resulting in the unreasonable use of the resources of the artificial intelligence processor chip.
  • diff update1 is determined according to M (t) . Assuming that the data bit width corresponding to the t-1th predictive time point is n 1 , the corresponding point position parameter is s 1 , and the moving average of the point position parameter increasing with the training iteration is m 1 . The data to be quantized is quantized by using the data bit width n 1 to obtain the quantized fixed-point number.
  • the data bit width used in the quantization of the t-th predictive time point is n 2 .
  • M (t) one of the following two optimization methods can be selected.
  • the data bit width n and the point position parameter s have a great influence on the quantization, and the scaling factor f and the offset O in the quantization parameter have little influence on the quantization. Therefore, regardless of whether the data bit width n changes and the point position parameter s is variable, it is also a very meaningful thing to determine the target iteration interval of the point position parameter s.
  • the process of determining the target iteration interval may include the following steps:
  • the pre-judgment time point is the time point used to judge whether the quantitative parameter needs to be adjusted, and the pre-judgment time The point corresponds to the time point when the weight update iteration is completed;
  • the quantization parameter is preferably a point position parameter.
  • bit width or quantization parameters are adjusted so that the appropriate quantization parameters are used at the appropriate iteration time point to realize the artificial intelligence processor chip executes the neural network operation to reach the fixed-point operation speed, and improves the peak computing power of the artificial intelligence processor chip. Satisfy the accuracy of floating-point operations required for operations.
  • steps in the flowchart of FIG. 2-1 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in Figure 2-1 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • the embodiment of the present disclosure also provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned neural network data quantization processing method is realized.
  • FIGS. 2-5 show a block diagram of a neural network data quantization processing device according to an embodiment of the present disclosure.
  • the device is applied to the processor 100 shown in FIG. 1, and the device includes a data statistics module 61, a quantization parameter determination module 62, and a quantization processing module 63.
  • a certain processing unit 101 is provided with a data statistics module 61, a quantization parameter determination module 62, and a quantization processing module 63.
  • the data statistics module 61, the quantization parameter determination module 62, and the quantization processing module 63 are respectively provided in different processing units 101.
  • the storage unit 102 is configured to store data related to the operation of the data statistics module 61, the quantization parameter determination module 62, and the quantization processing module 63, such as the data to be quantized, the statistical result, the quantization parameter, and the data bit width.
  • the data statistics module 61 counts the data to be quantified according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and determines the statistical result of each type of data to be quantified.
  • the quantization parameter determination module 62 determines the quantization parameter of each type of data to be quantized in the corresponding layer according to the statistical result and the data bit width.
  • the quantization processing module 63 uses the corresponding quantization parameter to quantize the data to be quantized.
  • the data to be quantized includes at least one of neurons, weights, and gradients of the neural network, and the quantization parameters include point position parameters, scaling coefficients, and offsets.
  • the quantization parameter and data bit width corresponding to each type of data to be quantized in the corresponding layer are the same.
  • the scaling factor and offset of the weights in the same channel of the corresponding layer are the same, and the point positions corresponding to the weights in all channels in the corresponding layer
  • the parameter and data width are the same.
  • the quantization parameters and data bit widths corresponding to neurons in all channels of the corresponding layer are the same.
  • the quantization parameters and data bit widths corresponding to the gradients in all channels of the corresponding layer are the same.
  • the neural network operation process may include at least one operation of neural network training, neural network inference, and neural network fine-tuning.
  • the statistical result may include any one of the following: the maximum absolute value of each type of data to be quantized, and one half of the distance between the maximum value and the minimum value of each type of data to be quantized.
  • the maximum absolute value is the absolute value of the maximum or minimum value in each type of data to be quantized.
  • the scaling factor may be determined according to the point position parameter, statistical result, and data bit width.
  • the offset may be determined according to the statistical result of each type of data to be quantized.
  • the point position parameter may be determined according to the statistical result and the data bit width.
  • the data bit width may be a preset value.
  • the device may further include: a bit width adjustment module, which adjusts the data bit width according to the quantization error corresponding to the data bit width, so as to determine the quantization parameter by using the adjusted data bit width.
  • the quantization error is determined according to the quantized data in the corresponding layer and the corresponding data before quantization.
  • the bit width adjustment module may include: an adjustment sub-module, which compares the quantization error with a threshold, and adjusts the data bit width according to the comparison result.
  • the threshold includes at least one of the first threshold and the second threshold.
  • the quantization error is compared with the threshold, and the data bit width is adjusted according to the comparison result, which may include any of the following: when the quantization error is greater than or equal to the first threshold, increase the data bit width; When the error is less than or equal to the second threshold, the data bit width is reduced; when the quantization error is between the first threshold and the second threshold, the data bit width remains unchanged.
  • the device may further include an inverse quantization processing module and a quantization error determination module.
  • the dequantization processing module dequantizes the quantized data to obtain dequantized data, wherein the data format of the dequantized data is the same as the data format of the corresponding data before quantization.
  • the quantization error determination module determines the quantization error according to the quantized data and the corresponding inverse quantization data.
  • the data before quantization may be data to be quantized.
  • the data before quantization may be the data to be quantified involved in the weight update iteration process within the target iteration interval.
  • the target iteration interval may include at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval.
  • the target iteration interval may be determined according to the change trend value of the point position parameter of the data to be quantified involved in the weight value update iteration process at the pre-determined time point.
  • the target iteration interval may be determined according to the change trend value of the point position parameter of the data to be quantified and the change trend value of the data bit width involved in the weight update iteration process at the predicted time point.
  • the pre-determined time point may be a time point for judging whether the data bit width needs to be adjusted, and the pre-determined time point corresponds to the time point when the weight update iteration is completed.
  • the change trend value of the point position parameter may be determined according to the sliding average value of the point position parameter corresponding to the current predicted time point and the sliding average value of the point position parameter corresponding to the last predicted time point. of.
  • the change trend value of the point position parameter may be determined based on the sliding average value of the point position parameter corresponding to the current predictive time point and the point position parameter corresponding to the previous predictive time point.
  • the change trend value of the data bit width may be determined according to the corresponding quantization error.
  • the device may further include a first sliding average determination module.
  • the first moving average determination module is configured to: determine the point position parameter corresponding to the current predictive time point according to the point position parameter corresponding to the last predictive time point and the adjustment value of the data bit width; and according to the adjustment value of the data bit width Adjust the sliding average of the point position parameters corresponding to the previous predictive time point to obtain the adjustment result; determine the sliding of the point position parameter corresponding to the current predictive time point according to the point position parameter corresponding to the current predictive time point and the adjustment result average value.
  • the device may further include a second sliding average determination module.
  • the second moving average determination module is configured to determine the point position corresponding to the current predictive time point according to the sliding average of the point position parameter corresponding to the last predictive time point and the point position parameter corresponding to the last predictive time point
  • the intermediate result of the sliding average of the parameter; the sliding average of the point position parameter corresponding to the current prediction time point is determined according to the intermediate result of the sliding average of the point position parameter corresponding to the current prediction time point and the adjustment value of the data bit width.
  • the data before quantization may be the data to be quantized involved in the weight update iteration within the target iteration interval.
  • the target iteration interval may include at least one weight update iteration, and the same quantization parameter is used in the quantization process within the same target iteration interval.
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified involved in the weight update iteration process at the predicted time point.
  • the predicted time point is the time point used to determine whether the quantization parameter needs to be adjusted, and the predicted time point corresponds to the time point when the weight update iteration is completed.
  • the neural network data quantization processing device uses corresponding quantization parameters to quantize the quantized data, which while ensuring accuracy, reduces the storage space occupied by the stored data, and ensures the accuracy of the calculation results And reliability, and can improve the efficiency of calculation, and quantization also reduces the size of the neural network model, reduces the performance requirements of the terminal running the neural network model, and makes the neural network model applicable to computing power, volume, and power consumption. Relatively limited mobile phones and other terminals.
  • a neural network data quantification processing method applied to a processor, the method comprising:
  • the data to be quantized includes at least one of neurons, weights, and gradients of the neural network
  • the quantization parameters include point position parameters, scaling factors, and offsets.
  • Clause A2 when the channel of the corresponding layer is a single channel or the corresponding layer has no channel, the quantization parameter and data bit width corresponding to each type of data to be quantized in the corresponding layer are the same.
  • Clause A3 According to the method described in Clause A1, when the channels of the corresponding layer are multi-channel, the scaling factors and offsets of the weights in the same channel of the corresponding layer are the same, and the weights in all channels in the corresponding layer correspond to The point position parameter and the data bit width are the same, the quantization parameters and data bit widths corresponding to the neurons in all channels of the corresponding layer are the same, and the quantization parameters and data bit widths corresponding to the gradients in all channels of the corresponding layer are the same.
  • Clause A4 The method according to any one of clauses A1 to A3, wherein the neural network operation process includes at least one operation of neural network training, neural network inference, and neural network fine-tuning.
  • the statistical result includes any one of the following: the maximum value of the absolute value of each type of data to be quantified, the maximum value and the minimum of each type of data to be quantified Value one-half of the distance,
  • the maximum absolute value is the absolute value of the maximum value or the minimum value in each type of data to be quantized.
  • Clause A6 The method according to any one of clauses A1 to A3, wherein the scaling factor is determined according to the point position parameter, the statistical result, and the data bit width.
  • the offset is determined according to the statistical result of each type of data to be quantified.
  • the data bit width is a preset value.
  • the data bit width is adjusted to determine the quantization parameter using the adjusted data bit width
  • the quantization error is determined according to the quantized data in the corresponding layer and the corresponding data before quantization.
  • adjusting the data bit width according to the quantization error corresponding to the data bit width includes:
  • the quantization error is compared with a threshold, and the data bit width is adjusted according to the comparison result,
  • the threshold includes at least one of a first threshold and a second threshold.
  • the quantization error is compared with a threshold, and the data bit width is adjusted according to the comparison result, including any of the following:
  • the data bit width remains unchanged.
  • the quantization error is determined according to the quantized data and corresponding inverse quantization data.
  • Clause A14 The method according to clause A10, wherein the data before quantization is the data to be quantized.
  • Clause A15 The method according to clause A10, wherein the data before quantification is the data to be quantified involved in the weight update iteration process within the target iteration interval;
  • the target iteration interval includes at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval.
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified involved in the weight update iteration process at the pre-determined time point,
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified and the change trend value of the data bit width involved in the weight update iteration process at the predicted time point,
  • the predetermined time point is a time point used to determine whether the data bit width needs to be adjusted, and the predetermined time point corresponds to the time point when the weight update iteration is completed.
  • the change trend value of the point position parameter is based on the sliding average value of the point position parameter corresponding to the current predictive time point, and the sliding average of the point position parameter corresponding to the previous predictive time point The average value is determined
  • the change trend value of the point position parameter is determined according to the sliding average value of the point position parameter corresponding to the current predictive time point and the point position parameter corresponding to the previous predictive time point,
  • the change trend value of the data bit width is determined according to the corresponding quantization error.
  • the sliding average value of the point position parameter corresponding to the current prediction time point is determined according to the intermediate result of the sliding average value of the point position parameters corresponding to the current prediction time point and the adjustment value of the data bit width.
  • Clause A20 The method according to clause A10, wherein the data before quantification is the data to be quantified involved in the weight update iteration within the target iteration interval; wherein the target iteration interval includes at least one weight update iteration, And the same quantization parameter is used in the quantization process within the same target iteration interval,
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified involved in the weight update iteration process at the predicted time point
  • the predetermined time point is a time point used to determine whether the quantization parameter needs to be adjusted, and the predetermined time point corresponds to a time point when the weight update iteration is completed.
  • a neural network data quantization processing device applied to a processor, the device comprising:
  • the data statistics module counts the data to be quantified according to the corresponding layer and the number of channels in the corresponding layer in the neural network operation process, and determines the statistical result of each data to be quantified;
  • the quantization parameter determination module determines the quantization parameter of each type of data to be quantized in the corresponding layer according to the statistical result and the data bit width;
  • a quantization processing module uses corresponding quantization parameters to quantize the data to be quantized
  • the data to be quantized includes at least one of neurons, weights, and gradients of the neural network
  • the quantization parameters include point position parameters, scaling factors, and offsets.
  • Clause A24 The device according to any one of clauses A21 to A23, wherein the neural network operation process includes at least one operation of neural network training, neural network inference, and neural network fine-tuning.
  • Clause A25 The device according to any one of clauses A21 to A23, wherein the statistical result includes any one of the following: the maximum value of the absolute value of each type of data to be quantified, the maximum value and the minimum of each type of data to be quantified Value one-half of the distance,
  • the maximum absolute value is the absolute value of the maximum value or the minimum value in each type of data to be quantized.
  • Clause A29 The device according to any one of clauses A21 to A23, wherein the data bit width is a preset value.
  • the bit width adjustment module adjusts the data bit width according to the quantization error corresponding to the data bit width to determine the quantization parameter using the adjusted data bit width
  • the quantization error is determined according to the quantized data in the corresponding layer and the corresponding data before quantization.
  • bit width adjustment module includes:
  • the adjustment sub-module compares the quantization error with a threshold, and adjusts the data bit width according to the comparison result,
  • the threshold includes at least one of a first threshold and a second threshold.
  • Clause A32 The device according to clause A31, which compares the quantization error with a threshold, and adjusts the data bit width according to the comparison result, including any of the following:
  • the data bit width remains unchanged.
  • the inverse quantization processing module performs inverse quantization on the quantized data to obtain inverse quantization data, wherein the data format of the inverse quantization data is the same as the data format of the corresponding data before quantization;
  • the quantization error determination module determines the quantization error according to the quantized data and corresponding inverse quantization data.
  • Clause A34 The device according to clause A30, wherein the data before quantization is the data to be quantized.
  • Clause A35 The device according to clause A30, wherein the data before quantification is the data to be quantified involved in the weight update iteration process within the target iteration interval;
  • the target iteration interval includes at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval.
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified and the change trend value of the data bit width involved in the weight update iteration process at the predicted time point,
  • the predetermined time point is a time point used to determine whether the data bit width needs to be adjusted, and the predetermined time point corresponds to the time point when the weight update iteration is completed.
  • the change trend value of the point position parameter is determined according to the sliding average value of the point position parameter corresponding to the current predictive time point and the point position parameter corresponding to the previous predictive time point,
  • the change trend value of the data bit width is determined according to the corresponding quantization error.
  • the first moving average determination module determines the point position parameter corresponding to the current predictive time point according to the point position parameter corresponding to the last predictive time point and the adjustment value of the data bit width;
  • the second moving average determination module determines the moving average of the point position parameter corresponding to the current predictive time point according to the moving average of the point position parameter corresponding to the last predictive time point and the point position parameter corresponding to the last predictive time point Intermediate result of value;
  • the sliding average value of the point position parameter corresponding to the current prediction time point is determined according to the intermediate result of the sliding average value of the point position parameters corresponding to the current prediction time point and the adjustment value of the data bit width.
  • Clause A40 The device according to Clause A30, wherein the data before quantification is data to be quantified involved in the weight update iteration within the target iteration interval; wherein the target iteration interval includes at least one weight update iteration, And the same quantization parameter is used in the quantization process within the same target iteration interval,
  • the target iteration interval is determined according to the change trend value of the point position parameter of the data to be quantified involved in the weight update iteration process at the predicted time point
  • the predetermined time point is a time point used to determine whether the quantization parameter needs to be adjusted, and the predetermined time point corresponds to a time point when the weight update iteration is completed.
  • An artificial intelligence chip including the neural network data quantification processing device as described in any one of clauses A21 to A40.
  • Clause A42 An electronic device comprising the artificial intelligence chip as described in Clause A41.
  • a board card comprising: a storage device, an interface device and a control device, and the artificial intelligence chip as described in Clause A41;
  • the artificial intelligence chip is connected to the storage device, the control device and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes: a DDR controller for controlling data transmission and data storage of each storage unit;
  • the interface device is: a standard PCIE interface.
  • Clause A45 A non-volatile computer-readable storage medium on which computer program instructions are stored, which, when executed by a processor, implement the neural network data described in any one of Clauses A1 to A20 Quantification processing method.
  • the calculation data involved in the calculation process of the cyclic neural network can be quantified, that is, the calculation data represented by the floating point is converted into the calculation data represented by the fixed point, thereby reducing the storage capacity and memory access efficiency of the storage device, and improving the calculation The computing efficiency of the device.
  • the traditional quantization method is to use the same data bit width and quantization parameters (such as the position of the decimal point) to quantify the different operation data of the recurrent neural network during the entire training process of the recurrent neural network.
  • quantization parameters such as the position of the decimal point
  • FIG. 3-1 is a structural block diagram of the quantization parameter adjustment device 100', wherein the processor 120 of the quantization parameter adjustment device 100' may be a general-purpose processor, and the processor 120 of the quantization parameter adjustment device 100' may also be a manual An intelligent processor.
  • the processor of the quantization parameter adjustment device 100 may also include a general-purpose processor and an artificial intelligence processor, which is not specifically limited here.
  • the memory 110 may be used to store operation data in a cyclic neural network operation process, and the operation data may be one or more of neuron data, weight data, or gradient data.
  • the memory 110 may also be used to store a computer program.
  • the computer program When the computer program is executed by the above-mentioned processor 120, it can implement the quantization parameter adjustment method in the embodiment of the present disclosure.
  • This method can be applied to the training or fine-tuning process of the cyclic neural network, and dynamically adjust the quantization parameters of the computing data according to the distribution characteristics of the computing data at different stages of the training or fine-tuning process of the cyclic neural network, thereby improving the performance of the cyclic neural network.
  • the accuracy of the quantification process ensures the accuracy and reliability of the calculation results.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the memory can be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), Dynamic Random Access Memory (DRAM), static Random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high bandwidth memory HBM (High-Bandwidth Memory), or hybrid storage cube HMC (Hybrid Memory Cube), etc.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SRAM Static Random-Access Memory
  • EDRAM Enhanced Dynamic Random Access Memory
  • HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • quantization refers to converting operation data in a first data format into operation data in a second data format.
  • the arithmetic data in the first data format may be floating-point arithmetic data
  • the arithmetic data in the second data format may be a fixed-point arithmetic data. Since floating-point calculation data usually occupies a large storage space, by converting the floating-point calculation data to fixed-point calculation data, storage space can be saved, and the storage efficiency and calculation efficiency of the calculation data can be improved. .
  • the quantization parameter in the quantization process may include a point position and/or a scaling factor, where the point position refers to the position of the decimal point in the quantized operation data.
  • the scaling factor refers to the ratio between the maximum value of the quantized data and the maximum absolute value of the data to be quantized.
  • the quantization parameter may also include an offset.
  • the offset is for asymmetric data to be quantized, and refers to the intermediate value of multiple elements in the data to be quantized. Specifically, the offset may be the data to be quantized. The midpoint value of multiple elements in the data.
  • the quantization parameter may not include an offset. In this case, quantization parameters such as point positions and/or scaling coefficients can be determined according to the data to be quantized.
  • Figure 3-2 shows a schematic diagram of the correspondence between the data to be quantized and the quantized data according to an embodiment of the present disclosure.
  • the data to be quantized is symmetrical data with respect to the origin, assuming that Z 1 is the data to be quantized
  • the maximum value of the absolute value of the element, the data bit width corresponding to the data to be quantized is n
  • A is the maximum value that can be represented by the quantized data after quantizing the data to be quantized by the data bit width n
  • A is 2 s (2 n-1 -1)
  • A needs to include Z 1
  • Z 1 must be greater than Therefore, there are constraints of formula (2-1):
  • the processor can calculate the point position s according to the maximum absolute value Z1 in the data to be quantized and the data bit width n.
  • the following formula (2-2) can be used to calculate the point position s corresponding to the data to be quantified:
  • ceil is rounded up
  • Z 1 is the maximum absolute value in the data to be quantized
  • s is the point position
  • n is the data bit width.
  • the floating-point representation of the data to be quantized F x can be expressed as: F x ⁇ I x ⁇ 2 s , where I x refers to the quantized n-bit binary representation Value, s represents the point position.
  • the quantized data corresponding to the data to be quantized is:
  • s is the point position
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (2-3). It can be understood that, in the case of a certain data bit width, in the quantized data obtained by quantization according to the point position, the more digits after the decimal point, the greater the quantization accuracy of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may include a first zoom factor, and the first zoom factor may be calculated as follows:
  • Z1 is the maximum absolute value of the data to be quantized
  • A is the maximum value that can be represented by the quantized data of the data to be quantized by the data bit width n
  • A is 2 s (2 n-1 -1).
  • the processor can quantize the to-be-quantized data F x by combining the point position and the first scaling factor to obtain the quantized data:
  • s is the point position determined according to the above formula (2-2)
  • f 1 is the first scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero may be used instead of rounding operations in formula (2-6).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may also include a second zoom factor, and the second zoom factor may be calculated as follows:
  • the processor may use the second scaling factor alone to quantize the to-be-quantized data F x to obtain the quantized data:
  • f 2 is the second scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding operations can also be used, such as rounding operations such as rounding up, rounding down, and rounding to zero, instead of rounding up in formula (2-9). It is understandable that when the data bit width is constant, different scaling factors can be used to adjust the numerical range of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the above-mentioned second zoom factor may be determined according to the point position and the first zoom factor f 1 . That is, the second scaling factor can be calculated according to the following formula:
  • s is the point position determined according to the above formula (2-2), and f 1 is the first scaling factor calculated according to the above formula (2-5).
  • the quantization method of the embodiment of the present disclosure can not only realize the quantization of symmetric data, but also realize the quantization of asymmetric data.
  • the processor can convert asymmetric data into symmetric data to avoid data "overflow".
  • the quantization parameter may also include an offset
  • the offset may be a midpoint value of the data to be quantized
  • the offset may be used to indicate the offset of the midpoint value of the data to be quantized relative to the origin.
  • Figure 3-3 shows a schematic diagram of the conversion of the data to be quantized according to an embodiment of the present disclosure.
  • the processor can perform statistics on the data distribution of the data to be quantized to obtain the minimum value among all the elements in the data to be quantized. Z min , and the maximum value Z max among all the elements in the data to be quantized, and then the processor may calculate the above-mentioned offset according to the minimum value Z min and the maximum value Z max .
  • the specific offset calculation method is as follows:
  • o represents the offset
  • Z min represents the minimum value among all the elements of the data to be quantized
  • Z max represents the maximum value among all the elements of the data to be quantized.
  • the processor may determine the maximum absolute value Z 2 in the data to be quantized according to the minimum value Z min and the maximum value Z max of all elements of the data to be quantized,
  • the processor can translate the data to be quantized according to the offset o, and convert the asymmetric data to be quantized into symmetric data to be quantized, as shown in Figure 3-3.
  • the processor can further determine the point position s according to the maximum absolute value Z 2 in the data to be quantized, where the point position can be calculated according to the following formula:
  • ceil is rounded up
  • s is the point position
  • n is the data bit width
  • the processor can quantize the quantized data according to the offset and the corresponding point position to obtain the quantized data:
  • s is the point position determined according to the above formula (2-14)
  • o is the offset
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding operations can also be used, such as rounding operations such as rounding up, rounding down, and rounding to zero, instead of rounding up in formula (2-15).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the processor may further determine the point position s and the first scaling factor f 1 according to the maximum absolute value Z 2 in the data to be quantized.
  • the first scaling factor f 1 can be calculated according to the following formula:
  • the processor can quantize the data to be quantized according to the offset and its corresponding first scaling factor f 1 and point position s to obtain the quantized data:
  • f 1 is the first scaling factor
  • s is the point position determined according to the above formula (2-14)
  • o is the offset
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounded value. Integer operation. It is understandable that other rounding operations can also be used, such as rounding operations such as rounding up, rounding down, and rounding to zero, instead of rounding up in formula (2-18).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F 1 is the first scaling factor
  • s is the point position determined according to the above formula (2-14)
  • o is the offset
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding.
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data
  • F x indicates a consistent format
  • the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may also include a second zoom factor, and the second zoom factor may be calculated as follows:
  • the processor may use the second scaling factor alone to quantize the to-be-quantized data F x to obtain the quantized data:
  • f 2 is the second scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding operations can also be used, such as rounding operations such as rounding up, rounding down, and rounding to zero, instead of rounding up in formula (2-21). It is understandable that when the data bit width is constant, different scaling factors can be used to adjust the numerical range of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the above-mentioned second zoom factor may be determined according to the point position and the first zoom factor f 1 . That is, the second scaling factor can be calculated according to the following formula:
  • s is the point position determined according to the above formula (2-14)
  • f 1 is the first scaling factor calculated according to the above formula (2-17).
  • the processor may also quantize the data to be quantized according to the offset o, at which point the point position s and/or the scaling factor may be preset values. At this time, the processor quantizes the quantized data according to the offset to obtain the quantized data:
  • o is the offset
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, such as rounding up, rounding down, and rounding to zero, replacing the rounding operation in formula (2-24). It is understandable that when the data bit width is constant, using different offsets can adjust the offset between the value of the quantized data and the data before quantization.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the quantization operation of the present disclosure can be used not only for the quantization of the floating point data described above, but also for realizing the quantization of fixed-point data.
  • the arithmetic data in the first data format may also be a fixed-point arithmetic data
  • the arithmetic data in the second data format may be a fixed-point arithmetic data
  • the arithmetic data in the second data format has a data representation range less than
  • the data in the first data format represents the range, the number of decimal places in the second data format is greater than that in the first data format, that is, the operation data in the second data format has higher precision than the operation data in the first data format .
  • the arithmetic data in the first data format is fixed-point data occupying 16 bits
  • the second data format may be fixed-point data occupying 8 bits.
  • quantization processing can be performed on the operation data represented by fixed points, thereby further reducing the storage space occupied by the operation data, and improving the memory access efficiency and the operation efficiency of the operation data.
  • the method for adjusting the quantitative parameters of the recurrent neural network can be applied to the training or fine-tuning process of the recurrent neural network, so as to dynamically adjust the calculation during the operation of the recurrent neural network during the training or fine-tuning process of the recurrent neural network
  • the quantization parameters of the data to improve the quantization accuracy of the cyclic neural network.
  • the recurrent neural network may be a deep recurrent neural network or a convolutional recurrent neural network, etc., which is not specifically limited here.
  • an iterative operation generally includes a forward operation, a reverse operation and a weight update operation.
  • Forward operation refers to the process of forward inference based on the input data of the recurrent neural network to obtain the result of the forward operation.
  • the reverse operation is a process of determining the loss value according to the result of the forward operation and the preset reference value, and determining the weight gradient value and/or the input data gradient value according to the loss value.
  • the weight update operation refers to the process of adjusting the weight of the recurrent neural network according to the gradient of the weight.
  • the training process of the recurrent neural network is as follows: the processor may use the recurrent neural network with a weight value of a random number to perform a forward operation on the input data to obtain a forward operation result. The processor then determines the loss value according to the forward operation result and the preset reference value, and determines the weight gradient value and/or the input data gradient value according to the loss value. Finally, the processor can update the gradient value of the recurrent neural network according to the weight gradient value, obtain a new weight value, and complete an iterative operation.
  • the processor cyclically executes multiple iterative operations until the forward operation result of the cyclic neural network meets the preset condition. For example, when the forward operation result of the recurrent neural network converges to the preset reference value, the training ends. Or, when the loss value determined by the forward operation result of the recurrent neural network and the preset reference value is less than or equal to the preset accuracy, the training ends.
  • Fine-tuning refers to the process of performing multiple iterative operations on the cyclic neural network (the weight of the cyclic neural network is already in a convergent state rather than a random number), so that the accuracy of the cyclic neural network can meet the preset requirements.
  • This fine-tuning process is basically the same as the above-mentioned training process, and can be regarded as a process of retraining the recurrent neural network in a convergent state.
  • Inference refers to the process of using cyclic neural networks whose weights meet preset conditions to perform forward operations to realize functions such as recognition or classification, such as the use of cyclic neural networks for image recognition and so on.
  • FIG. 3-4 show a flowchart of a method for adjusting quantization parameters of a recurrent neural network according to an embodiment of the present disclosure. As shown in Figure 3-4, the above method may include step S100 to step S200.
  • step S100 the data variation range of the data to be quantized is obtained.
  • the processor may directly read the data variation range of the data to be quantized, and the data variation range of the data to be quantized may be input by the user.
  • the processor may also calculate the data variation range of the data to be quantified according to the data to be quantified in the current inspection iteration and the data to be quantified in the historical iteration.
  • the current inspection iteration refers to the iterative operation currently performed
  • the historical iteration refers to the iterative operation performed before the current inspection iteration.
  • the processor can obtain the maximum value of the elements in the data to be quantified and the average value of the elements in the current inspection iteration, as well as the maximum value of the elements in the data to be quantized and the average value of the elements in each historical iteration, and according to the elements in each iteration The maximum value of and the average value of the elements determine the variation range of the data to be quantified.
  • the data variation range of the data to be quantified can be represented by the moving average or variance of the data to be quantified, which is not specifically limited here.
  • the data variation range of the data to be quantized can be used to determine whether the quantization parameter of the data to be quantized needs to be adjusted. For example, if the data to be quantized varies greatly, it can be explained that the quantization parameters need to be adjusted in time to ensure the quantization accuracy. If the data change range of the data to be quantified is small, the quantization parameter of the historical iteration can be used for the current inspection iteration and a certain number of iterations thereafter, thereby avoiding frequent adjustment of the quantization parameter and improving the quantization efficiency.
  • each iteration involves at least one data to be quantized
  • the data to be quantized may be arithmetic data represented by a floating point or a fixed point.
  • the data to be quantified in each iteration may be at least one of neuron data, weight data, or gradient data
  • the gradient data may also include neuron gradient data, weight gradient data, and the like.
  • a first target iteration interval is determined according to the data variation range of the data to be quantized, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein the first The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the quantization parameter may include the above-mentioned point position and/or zoom factor, where the zoom factor may include a first zoom factor and a second zoom factor.
  • the specific point position calculation method can refer to the above formula (2-2), and the calculation method of the scaling factor can refer to the above formula (2-5) or (2-8), which will not be repeated here.
  • the quantization parameter may also include an offset, and the calculation method of the offset may refer to the above formula (2-12); furthermore, the processor may also determine the point according to the formula (2-14) For position, the zoom factor is determined according to the above formula (2-17) or (2-20).
  • the processor may update at least one of the above-mentioned point position, scaling factor or offset according to the determined target iteration interval to adjust the quantization parameter in the cyclic neural network operation.
  • the quantization parameter in the cyclic neural network operation can be updated according to the data variation range of the data to be quantized in the cyclic neural network operation, so that the quantization accuracy can be guaranteed.
  • the data change curve of the data to be quantified can be obtained by performing statistics and analysis on the change trend of the calculation data during the training or fine-tuning process of the recurrent neural network.
  • Figure 3-5a shows the variation trend diagram of the data to be quantified in the calculation process of an embodiment of the present disclosure.
  • the data variation curve it can be known that in the initial stage of cyclic neural network training or fine-tuning, The data changes of the data to be quantified in different iterations are relatively drastic. As the training or fine-tuning operation progresses, the data changes of the data to be quantized in different iterations gradually become flat.
  • the quantization parameters can be adjusted more frequently; in the middle and late stages of cyclic neural network training or fine-tuning, the quantization parameters can be adjusted at intervals of multiple iterations or cycles.
  • the method of the present disclosure is to determine a suitable iteration interval to achieve a balance between quantization accuracy and quantization efficiency.
  • the processor may determine the first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the cyclic neural network operation according to the first target iteration interval.
  • the first target iteration interval may increase as the data variation range of the data to be quantized decreases. That is to say, when the data change range of the data to be quantized is larger, the first target iteration interval is smaller, which indicates that the quantization parameter is adjusted more frequently.
  • the smaller the data variation range of the data to be quantified the larger the first target iteration interval, which indicates that the adjustment of the quantization parameter is less frequent.
  • the above-mentioned first target iteration interval may also be a hyperparameter.
  • the first target iteration interval may be customized by a user.
  • the aforementioned weight data, neuron data, gradient data, and other data to be quantified may have different iteration intervals.
  • the processor may obtain the data variation amplitudes corresponding to various data to be quantized, so as to determine the first target iteration interval corresponding to the corresponding data to be quantized according to the data variation amplitudes of each type of data to be quantized.
  • the quantization process of various data to be quantized can be performed asynchronously.
  • different data variation ranges of the data to be quantified can be used to determine the corresponding first target iteration interval, and iterate according to the corresponding first target.
  • the interval determines the corresponding quantization parameter, so that the quantization accuracy of the data to be quantized can be guaranteed, and the accuracy of the calculation result of the recurrent neural network can be ensured.
  • the same target iteration interval (including any one of the first target iteration interval, the preset iteration interval, and the second target iteration interval) can be determined for different types of data to be quantified, so as to The target iteration interval adjusts the quantization parameter corresponding to the data to be quantized.
  • the processor may obtain the data variation range of various data to be quantized, and determine the target iteration interval according to the maximum data variation range of the data to be quantized, and determine the quantization parameters of various data to be quantized according to the target iteration interval.
  • different types of data to be quantized can also use the same quantization parameter.
  • the aforementioned cyclic neural network may include at least one arithmetic layer, and the data to be quantified may be at least one of neuron data, weight data, or gradient data involved in each arithmetic layer.
  • the processor can obtain the data to be quantized related to the current arithmetic layer, and determine the data variation range of various data to be quantized in the current arithmetic layer and the corresponding first target iteration interval according to the above method.
  • the processor may determine the above-mentioned data variation range of the data to be quantized once in each iteration operation process, and determine a first target iteration interval according to the data variation range of the corresponding data to be quantized. In other words, the processor may calculate the first target iteration interval once in each iteration. For the specific calculation method of the first target iteration interval, refer to the description below. Further, the processor may select the inspection iteration from each iteration according to preset conditions, determine the variation range of the data to be quantified at each inspection iteration, and determine the quantization parameter and other parameters according to the first target iteration interval corresponding to the inspection iteration. Update adjustments. At this time, if the iteration is not the selected inspection iteration, the processor may ignore the first target iteration interval corresponding to the iteration.
  • each target iteration interval may correspond to a verification iteration
  • the verification iteration may be the initial iteration of the target iteration interval or the end iteration of the target iteration interval.
  • the processor can adjust the quantization parameter of the cyclic neural network at the inspection iteration of each target iteration interval, so as to adjust the quantization parameter of the cyclic neural network operation according to the target iteration interval.
  • the verification iteration may be a point in time for verifying whether the current quantization parameter meets the requirements of the data to be quantified.
  • the quantization parameter before adjustment may be the same as the quantization parameter after adjustment, or may be different from the quantization parameter after adjustment.
  • the interval between adjacent inspection iterations may be greater than or equal to a target iteration interval.
  • the target iteration interval may calculate the number of iterations from the current inspection iteration, and the current inspection iteration may be the starting iteration of the target iteration interval. For example, if the current inspection iteration is the 100th iteration, the processor determines that the target iteration interval is 3 according to the data change range of the data to be quantified, and the processor can determine that the target iteration interval includes 3 iterations, each being the 100th iteration. Second iteration, 101st iteration and 102nd iteration. The processor can adjust the quantization parameter in the recurrent neural network operation at the 100th iteration. Among them, the current inspection iteration is the corresponding iterative operation when the processor currently performs the update and adjustment of the quantization parameter.
  • the target iteration interval may also be the number of iterations calculated from the next iteration of the current inspection iteration, and the current inspection iteration may be the termination iteration of the previous iteration interval before the current inspection iteration.
  • the processor determines that the target iteration interval is 3 according to the data change range of the data to be quantified, and the processor can determine that the target iteration interval includes 3 iterations, which are 101st. Iterations, 102 iterations, and 103 iterations.
  • the processor can adjust the quantization parameter in the recurrent neural network operation at the 100th iteration and the 103rd iteration.
  • the present disclosure does not specifically limit the method for determining the target iteration interval.
  • FIGS 3-5b show an expanded schematic diagram of a recurrent neural network according to an embodiment of the present disclosure.
  • the unfolding schematic diagram of the hidden layer of the recurrent neural network is given.
  • X represents the input sample.
  • W represents the weight of the input, U represents the weight of the input sample at the moment, and V represents the weight of the output sample. Due to the different number of layers unfolded by different cyclic neural networks, the total number of iterations contained in different cycles is different when the quantized parameter update of the cyclic neural network is different.
  • Figures 3-5c show a schematic diagram of the cycle of a recurrent neural network according to an embodiment of the present disclosure.
  • iter 1 , iter 2 , iter 3 , and iter 4 are the three cycles of the recurrent neural network.
  • the first cycle iter 1 includes t 0 , t 1 , t 2 , and t 3 Iterations.
  • the second cycle iter 2 includes two iterations t 0 and t 1 .
  • the third cycle iter 3 includes three iterations t 0 , t 1 , and t 2 .
  • the fourth cycle iter 2 includes five iterations t 0 , t 1 , t 2 , t 3 , and t 4 .
  • the data variation range of the data to be quantized may also be It is determined indirectly by the change range of the quantization parameter, and the data change range of the data to be quantized can be characterized by the change range of the quantization parameter.
  • FIGS. 3-6 show a flowchart of a method for adjusting parameters of a recurrent neural network according to an embodiment of the present disclosure.
  • the above operation S100 may include operation S110, and operation S200 may include operation S210 (detailed See description below).
  • the variation range of the point position can indirectly reflect the variation range of the data to be quantified.
  • the variation range of the point position may be determined according to the point position of the current inspection iteration and the point position of at least one historical iteration. Among them, the point position of the current test iteration and the point position of each historical iteration can be determined according to formula (2-2). Of course, the point position of the current test iteration and the point position of each historical iteration can also be determined according to formula (2-14).
  • the processor may also calculate the variance of the point position of the current test iteration and the point position of the historical iteration, and determine the variation range of the point position according to the variance.
  • the processor may determine the variation range of the point position according to the average value of the point position of the current inspection iteration and the point position of the historical iteration.
  • the foregoing operation S110 may include operation S111 to operation S113, and operation S210 may include operation S211 (see the following description for details).
  • S111 Determine a first average value according to the point position corresponding to the previous inspection iteration before the current inspection iteration and the point position corresponding to the historical iteration before the previous inspection iteration.
  • the previous inspection iteration is the iteration corresponding to the last time the quantization parameter is adjusted, and there is at least one iteration interval between the previous inspection iteration and the current inspection iteration.
  • At least one historical iteration may belong to at least one iteration interval, each iteration interval may correspond to one inspection iteration, and two adjacent inspection iterations may have one iteration interval.
  • the previous inspection iteration in the foregoing operation S111 may be the inspection iteration corresponding to the previous iteration interval before the target iteration interval.
  • the first average value can be calculated according to the following formula:
  • a1 ⁇ am refer to the calculated weights corresponding to the point positions of each iteration
  • s t-1 refers to the point positions corresponding to the previous test iteration
  • s t-2 refers to the previous test
  • st-3 refers to the previous test
  • M1 refers to the above-mentioned first mean value.
  • the last test iteration is the 100th iteration of the cyclic neural network operation
  • the historical iteration can be from the 1st iteration to the 99th iteration
  • the processor can obtain the point position of the 100th iteration (ie st-1 ), and obtain the point position of the historical iteration before the 100th iteration, that is, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network...
  • st-3 can refer to the 98th time of the cyclic neural network
  • the point position corresponding to the iteration, st-2 can refer to the point position corresponding to the 99th iteration of the recurrent neural network.
  • the processor may calculate the first average value according to the above formula.
  • the first average value can be calculated according to the point position of the inspection iteration corresponding to each iteration interval.
  • the first average value can be calculated according to the following formula:
  • M1 a1 ⁇ s t-1 +a2 ⁇ s t-2 +a3 ⁇ s t-3 +...+am ⁇ s 1 ;
  • a1 ⁇ am refer to the calculated weights corresponding to the point positions of each inspection iteration
  • s t-1 refers to the point positions corresponding to the previous inspection iteration
  • s t-2 refers to the point positions corresponding to the previous inspection iteration
  • s t-3 refers to the previous The point positions corresponding to the inspection iterations of the preset number of iteration intervals before the inspection iteration
  • M1 refers to the above-mentioned first mean value.
  • the last test iteration is the 100th iteration of the cyclic neural network operation
  • the historical iteration can be from the 1st iteration to the 99th iteration
  • the 99 historical iterations can be divided into 11 iteration intervals.
  • the 1st iteration to the 9th iteration belong to the 1st iteration interval
  • the 10th iteration to the 18th iteration belong to the 2nd iteration interval
  • the 90th iteration to the 99th iteration belong to the 11th iteration Iteration interval.
  • the processor can obtain the point position of the 100th iteration (ie s t-1 ), and obtain the point position of the test iteration in the iteration interval before the 100th iteration, that is, s 1 can refer to the first of the recurrent neural network.
  • the point position corresponding to the inspection iteration of the iteration interval (for example, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network), ..., st-3 can refer to the inspection of the 10th iteration interval of the cyclic neural network
  • the point position corresponding to the iteration (for example, st-3 can refer to the point position corresponding to the 81st iteration of the cyclic neural network)
  • st-2 can refer to the point position corresponding to the inspection iteration of the 11th iteration interval of the cyclic neural network ( For example, st-2 can refer to the point position corresponding to the 90th iteration of the recurrent neural network).
  • the processor may calculate the first average value M1 according to the above formula.
  • the iteration interval includes the same number of iterations.
  • the number of iterations contained in the iteration interval of the cyclic neural network is different.
  • the number of iterations included in the iteration interval increases with the increase of iterations, that is, as the training or fine-tuning of the cyclic neural network proceeds, the iteration interval may become larger and larger.
  • the above-mentioned first average value M1 can be calculated using the following formula:
  • refers to the calculated weight of the point position corresponding to the previous inspection iteration
  • s t-1 refers to the point position corresponding to the previous inspection iteration
  • M0 refers to the moving average corresponding to the inspection iteration before the previous inspection iteration
  • S112 Determine a second average value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration.
  • the point position corresponding to the current inspection iteration can be determined according to the target data bit width of the current inspection iteration and the data to be quantified.
  • the second mean value M2 can be calculated according to the following formula:
  • b1 ⁇ bm means calculates a weight center position of each iteration corresponding to the heavy
  • s t is the current test iteration corresponding point position
  • s 1 refers to a history prior to the current test iteration
  • M2 refers to the second mean value mentioned above.
  • the current inspection iteration is the 101st iteration of the cyclic neural network operation
  • the historical iteration before the current inspection iteration refers to the 1st iteration to the 100th iteration.
  • the processor can obtain the point position of the 101st iteration (ie s t ), and obtain the point position of the historical iteration before the 101st iteration, that is, s 1 can refer to the point corresponding to the first iteration of the recurrent neural network Position...
  • st-2 can refer to the point position corresponding to the 99th iteration of the cyclic neural network
  • st-1 can refer to the point position corresponding to the 100th iteration of the cyclic neural network.
  • the processor may calculate the second average value M2 according to the above formula.
  • the second average value may be calculated according to the point position of the inspection iteration corresponding to each iteration interval.
  • FIG. 3-8 shows a flowchart of a method for determining the second mean value in an embodiment of the present disclosure.
  • the above operation S112 may include the following operations:
  • the second average value can be calculated according to the following formula:
  • M2 b1 ⁇ s t + b2 ⁇ s t-1 + b3 ⁇ s t-2 +...+bm ⁇ s 1 ;
  • b1 ⁇ bm means calculates a weight center position of each iteration corresponding to the heavy
  • s t is the current test iteration corresponding point position
  • s 1 refers to the test prior to the current test iteration
  • M2 refers to the second mean value mentioned above.
  • the current inspection iteration is the 100th iteration
  • the historical iteration may be from the 1st iteration to the 99th iteration
  • the 99 historical iterations may belong to 11 iteration intervals.
  • the 1st iteration to the 9th iteration belong to the 1st iteration interval
  • the 10th iteration to the 18th iteration belong to the 2nd iteration interval
  • the 90th iteration to the 99th iteration belong to the 11th iteration Iteration interval.
  • the processor can obtain the point position of the 100th iteration (ie s t ), and obtain the point position of the test iteration in the iteration interval before the 100th iteration, that is, s 1 can refer to the first iteration of the recurrent neural network
  • the point position corresponding to the interval inspection iteration for example, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network
  • st-2 can refer to the inspection iteration corresponding to the 10th iteration interval of the cyclic neural network
  • st-1 can refer to the point position corresponding to the inspection iteration of the 11th iteration interval of the recurrent neural network (for example, s t-1 can refer to the point position corresponding to the 90th iteration of the recurrent neural network).
  • the processor may calculate the second average value M2 according
  • the iteration interval includes the same number of iterations.
  • the number of iterations included in the iteration interval may be different.
  • the number of iterations included in the iteration interval increases with the increase of iterations, that is, as the training or fine-tuning of the cyclic neural network proceeds, the iteration interval may become larger and larger.
  • the processor may determine the second average value according to the point position corresponding to the current inspection iteration and the first average value, that is, the second average value may be as follows The formula is calculated:
  • refers to the calculated weight of the point position corresponding to the current inspection iteration
  • M1 refers to the above-mentioned first mean value
  • the first error may be equal to the absolute value of the difference between the second average value and the aforementioned first average value.
  • the above-mentioned first error can be calculated according to the following formula:
  • the above-mentioned point position of the current inspection iteration can be determined according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration.
  • the target data bit width corresponding to the current inspection iteration described above may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration may be user-defined input.
  • the data bit width corresponding to the data to be quantized in the training or fine-tuning process of the recurrent neural network may be constant, that is, the same type of data to be quantized in the same recurrent neural network is quantized with the same data bit width, for example, for The neuron data in each iteration of the cyclic neural network is quantized with a data width of 8 bits.
  • the data bit width corresponding to the data to be quantized in the training or fine-tuning process of the recurrent neural network is variable to ensure that the data bit width can meet the quantization requirements of the data to be quantized. That is, the processor can adaptively adjust the data bit width corresponding to the data to be quantized according to the data to be quantized, and obtain the target data bit width corresponding to the data to be quantized. Specifically, the processor may first determine the target data bit width corresponding to the current inspection iteration, and then the processor may determine the current inspection iteration corresponding to the target data bit width corresponding to the current inspection iteration and the data to be quantified corresponding to the current inspection iteration Point location.
  • FIG. 3-9 shows a flowchart of a method for adjusting a data bit width in an embodiment of the present disclosure.
  • the foregoing operation S110 may include:
  • the foregoing processor may use the initial data bit width to quantize the data to be quantized to obtain the foregoing quantized data.
  • the initial data bit width of the current inspection iteration may be a hyperparameter, and the initial data bit width of the current inspection iteration may also be determined based on the data to be quantified of the previous inspection iteration before the current inspection iteration.
  • the processor may determine the intermediate representation data according to the data to be quantified in the current inspection iteration and the quantitative data in the current inspection iteration.
  • the intermediate representation data is consistent with the aforementioned representation format of the data to be quantized.
  • the processor may dequantize the above-mentioned quantized data to obtain intermediate representation data consistent with the representation format of the data to be quantized, where dequantization refers to the inverse process of quantization.
  • the quantized data can be obtained using the above formula (2-3), and the processor can also dequantize the quantized data according to the above formula (2-4) to obtain the corresponding intermediate representation data, and according to the data to be quantized and the intermediate representation The data determines the quantization error.
  • the processor may calculate the quantization error according to the data to be quantized and the corresponding intermediate representation data.
  • the processor may determine an error term according to the to-be-quantized data F x and its corresponding intermediate representation data F x1 , and determine the quantization error according to the error term.
  • the processor may determine the above-mentioned error term according to the sum of the elements in the intermediate representation data F x1 and the sum of the elements in the to-be-quantized data F x .
  • the error term may be the sum of the elements in the intermediate representation data F x1 .
  • the difference between the sum of the elements in the data F x to be quantized the processor can determine the quantization error according to the error term.
  • the specific quantization error can be determined according to the following formula:
  • z i is an element in the data to be quantized
  • z i (n) is an element in the middle representing data F x1 .
  • the processor may calculate data of the elements to be quantized with the intermediate data F x1 represents the difference in the respective elements, a difference value of m, and m, and the error term as the difference values. After that, the processor can determine the quantization error according to the error term.
  • the specific quantization error can be determined according to the following formula:
  • z i is an element in the data to be quantized
  • z i (n) is an element in the middle representing data F x1 .
  • the difference between each element in the data to be quantized and the corresponding element in the intermediate representation data F x1 may be approximately equal to 2 s-1 . Therefore, the quantization error may also be determined according to the following formula:
  • m is the number of intermediate representation data F x1 corresponding to the target data
  • s is the point position
  • z i is the element in the data to be quantified.
  • the intermediate representation data can also be consistent with the data representation format of the aforementioned quantized data, and the quantization error is determined based on the intermediate representation data and the quantized data.
  • the data to be quantified can be expressed as: F x ⁇ I x ⁇ 2 s , then the intermediate representation data can be determined
  • the intermediate representation data I x1 may have the same data representation format as the aforementioned quantized data.
  • the processor can calculate according to the intermediate representation data I x1 and the above formula (2-3) Determine the quantization error.
  • the specific quantization error determination method can refer to the above formula (2-31) to formula (2-33).
  • the processor may adaptively adjust the data bit width corresponding to the current inspection iteration according to the quantization error, and determine the target data bit width adjusted by the current inspection iteration.
  • the quantization error satisfies the preset condition
  • the data bit width corresponding to the current inspection iteration can be kept unchanged, that is, the target data bit width of the current inspection iteration can be equal to the initial data bit width.
  • the processor can adjust the data bit width corresponding to the data to be quantized in the current inspection iteration to obtain the target data bit width corresponding to the current inspection iteration.
  • the processor uses the target data bit width to quantize the data to be quantized in the current inspection iteration, the quantization error satisfies the aforementioned preset condition.
  • the aforementioned preset condition may be a preset threshold set by the user.
  • FIG. 3-10 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • the foregoing operation S115 may include:
  • the processor may determine whether the aforementioned quantization error is greater than or equal to a first preset threshold.
  • operation S1151 may be performed to increase the data bit width corresponding to the current inspection iteration to obtain the target data bit width of the current inspection iteration.
  • the quantization error is less than the first preset threshold, the data bit width of the current inspection iteration can be kept unchanged.
  • the processor may obtain the aforementioned target data bit width after one adjustment.
  • the initial data bit width of the current inspection iteration is n1
  • the obtained quantization error may be less than the first preset threshold.
  • the processor may obtain the target data bit width through multiple adjustments until the quantization error is less than the first preset threshold, and use the data bit width when the quantization error is less than the first preset threshold as the target data bit width. Specifically, if the quantization error is greater than or equal to the first preset threshold, the first intermediate data bit width is determined according to the first preset bit width step size; then the processor can check the current check according to the first intermediate data bit width The iterative data to be quantized is quantized to obtain quantized data, and a quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold . The processor may use the data bit width corresponding to when the quantization error is less than the first preset threshold value as the target data bit width.
  • the initial data bit width of the current inspection iteration is n1
  • the processor can use the initial data bit width n1 to quantize the data to be quantized A of the current inspection iteration to obtain the quantized data B1, and according to the data to be quantized A and the quantized data B1 is calculated to obtain the quantization error C1.
  • the aforementioned first preset bit width step length may be a constant value. For example, whenever the quantization error is greater than the first preset threshold, the processor may increase the data bit width corresponding to the current inspection iteration by the same value. Bit width value.
  • the aforementioned first preset bit width step size may also be a variable value. For example, the processor may calculate the difference between the quantization error and the first preset threshold, if the quantization error is greater than the first preset threshold The smaller the difference, the smaller the value of the first preset bit width step.
  • FIG. 3-11 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • the foregoing operation S115 may further include:
  • the processor may determine whether the aforementioned quantization error is less than or equal to a first preset threshold.
  • operation S1153 may be performed to reduce the data bit width corresponding to the current inspection iteration to obtain the target data bit width of the current inspection iteration.
  • the quantization error is greater than the second preset threshold, the data bit width of the current inspection iteration can be kept unchanged.
  • the processor may obtain the aforementioned target data bit width after one adjustment.
  • the initial data bit width of the current inspection iteration is n1
  • the target data bit width n2 is used to quantize the data to be quantized in the current inspection iteration, the obtained quantization error may be greater than the second preset threshold.
  • the processor may obtain the target data bit width through multiple adjustments until the quantization error is greater than the second preset threshold, and use the data bit width when the quantization error is greater than the second preset threshold as the target data bit width. Specifically, if the quantization error is less than or equal to the first preset threshold, the second intermediate data bit width is determined according to the second preset bit width step; then the processor can check the current check according to the second intermediate data bit width.
  • the iterative data to be quantized is quantized to obtain quantized data, and a quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is greater than the second preset threshold .
  • the processor may use the data bit width corresponding to when the quantization error is greater than the second preset threshold value as the target data bit width.
  • the initial data bit width of the current inspection iteration is n1
  • the processor can use the initial data bit width n1 to quantize the data to be quantized A of the current inspection iteration to obtain the quantized data B1, and according to the data to be quantized A and the quantized data B1 is calculated to obtain the quantization error C1.
  • the aforementioned second preset bit width step length may be a constant value. For example, whenever the quantization error is less than the second preset threshold, the processor may reduce the data bit width corresponding to the current inspection iteration by the same value. Bit width value.
  • the aforementioned second preset bit width step size may also be a variable value. For example, the processor may calculate the difference between the quantization error and the second preset threshold, if the quantization error is greater than the second preset threshold The smaller the difference, the smaller the value of the second preset bit width step.
  • FIG. 3-12 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • the processor determines that the quantization error is less than the first preset threshold, and the quantization error When it is greater than the second preset threshold, the data bit width of the current inspection iteration can be kept unchanged, where the first preset threshold is greater than the second preset threshold. That is, the target data bit width of the current inspection iteration can be equal to the initial data bit width.
  • FIGS. 3-12 only illustrate the data bit width determination method of an embodiment of the present disclosure by way of example, and the sequence of each operation in FIGS. 3-12 can be adjusted adaptively, which is not specifically limited here.
  • FIG. 3-13 shows a flowchart of a method for determining the second mean value in another embodiment of the present disclosure. As shown in FIG. 3-13, the above method may further include:
  • the processor may reduce the second average value accordingly. If the data bit width adjustment value is less than the preset parameter (for example, the preset parameter may be equal to zero), that is, when the data bit width of the current inspection iteration decreases, the processor may increase the second average value accordingly.
  • the data bit width adjustment value is equal to the preset parameter, that is, when the data bit width adjustment value is equal to 0, the data to be quantified corresponding to the current test iteration has not changed at this time, and the updated second average value is equal to the second average value before the update,
  • the second average value before the update is calculated according to the above formula (2-29).
  • the processor may not update the second average value, that is, the processor may not perform the above operation S117.
  • the updated second average value M2 ⁇ (s t - ⁇ n)+(1- ⁇ ) ⁇ (M1- ⁇ n).
  • the updated second mean value M2 ⁇ (s t - ⁇ n)+ (1- ⁇ ) ⁇ (M1+ ⁇ n), where st refers to the point position determined by the current inspection iteration according to the target data bit width.
  • the updated second average value M2 ⁇ st +(1- ⁇ ) ⁇ M1- ⁇ n.
  • the updated second mean value M2 ⁇ st +( 1- ⁇ ) ⁇ M1+ ⁇ n, where st refers to the point position determined according to the target data bit width in the current inspection iteration.
  • the foregoing operation S200 may include:
  • the above-mentioned first error may represent the variation range of the point position. Therefore, as shown in FIGS. 3-7, the above-mentioned operation S210 may include:
  • the processor may determine the first target iteration interval according to the first error, where the first target iteration interval is negatively correlated with the first error. That is, the larger the first error, the larger the variation range of the point position, which in turn indicates the larger the data variation range of the data to be quantized. At this time, the first target iteration interval is smaller.
  • the processor may calculate the first target iteration interval I according to the following formula:
  • I is the first target iteration interval
  • diff update1 represents the above-mentioned first error
  • ⁇ and ⁇ may be hyperparameters.
  • the first error can be used to measure the variation range of the point position.
  • the first target iteration interval is determined according to the variation range of the point position (first error) by calculating the variation range of the point position. Since the quantization parameter is determined according to the first target iteration interval, the quantized data obtained by quantization according to the quantization parameter can be more in line with the change trend of the point position of the target data, while ensuring the quantization accuracy, improving the operating efficiency of the recurrent neural network .
  • the processor may further determine the quantization parameter and data bit width corresponding to the first target iteration interval at the current inspection iteration, so as to be based on the first target iteration interval. Update the quantization parameters.
  • the quantization parameter may include a point position and/or a scaling factor. Further, the quantization parameter may also include an offset.
  • Figure 3-14 shows a flowchart of a method for adjusting a quantization parameter according to another embodiment of the present disclosure. As shown in Figure 3-14, the above method may further include:
  • the processor adjusts the quantization parameter in the cyclic neural network operation according to the first target iteration interval.
  • the processor may determine update iterations (also called inspection iterations) according to the first target iteration interval and the total number of iterations in each cycle, and update the first target iteration interval at each update iteration, and may also update it at each update iteration.
  • Update the quantization parameters at any time For example, the data bit width in the cyclic neural network operation remains unchanged. At this time, the processor can directly adjust the quantization parameters such as the position of the point according to the to-be-quantized data of the update iteration at each update iteration. As another example, the data bit width in the cyclic neural network operation is variable. At this time, the processor can update the data bit width at each update iteration, and adjust the point according to the updated data bit width and the data to be quantized in the update iteration Quantitative parameters such as position.
  • the processor updates the quantization parameter at each inspection iteration to ensure that the current quantization parameter meets the quantization requirement of the data to be quantized.
  • the first target iteration interval before the update and the first target iteration interval after the update may be the same or different.
  • the data bit width before the update and the data bit width after the update can be the same or different; that is, the data bit width of different iteration intervals can be the same or different.
  • the quantization parameter before the update and the quantization parameter after the update may be the same or different; that is, the quantization parameters at different iteration intervals may be the same or different.
  • the processor may determine the quantization parameter in the first target iteration interval at the update iteration, so as to adjust the quantization parameter in the recurrent neural network operation.
  • operation S200 when the method is used in the training or fine-tuning process of the recurrent neural network, operation S200 may include:
  • the processor determines whether the current inspection iteration is greater than the first preset iteration, wherein, when the current inspection iteration is greater than the first preset iteration, the first target iteration interval is determined according to the data variation range of the data to be quantified. When the current inspection iteration is less than or equal to the first preset iteration, the quantization parameter is adjusted according to the preset iteration interval.
  • the current inspection iteration refers to the iterative operation currently performed by the processor.
  • the first preset iteration may be a hyperparameter, the first preset iteration may be determined according to a data variation curve of the data to be quantified, and the first preset iteration may also be set by a user.
  • the first preset iteration may be less than the total number of iterations included in one epoch, where one cycle means that all data to be quantized in the data set complete one forward operation and one reverse operation.
  • the processor may read the first preset iteration input by the user, and determine the preset iteration interval according to the correspondence between the first preset iteration and the preset iteration interval.
  • the preset iteration interval may be a hyperparameter, and the preset iteration interval may also be set by a user.
  • the processor can directly read the first preset iteration and the preset iteration interval input by the user, and update the quantization parameter in the cyclic neural network operation according to the preset iteration interval.
  • the processor does not need to determine the target iteration interval according to the data variation range of the data to be quantified.
  • the quantization parameter can be updated according to the preset iteration interval. That is, the processor can determine to update the quantization parameter every 5 iterations from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network. Specifically, the processor may determine the quantization parameters such as the data bit width n1 and the point position s1 corresponding to the first iteration, and use the data bit width n1 and the point position s1 to analyze the results from the first iteration to the fifth iteration.
  • the data to be quantized is quantized, that is, the same quantization parameter can be used from the first iteration to the fifth iteration.
  • the processor can determine the quantization parameters such as the data bit width n2 and the point position s2 corresponding to the 6th iteration, and use the data bit width n2 and the point position s2 to determine the waiting time from the 6th iteration to the 10th iteration.
  • the quantized data is quantized, that is, the same quantization parameter can be used from the 6th iteration to the 10th iteration.
  • the processor can follow the above quantization method until the 100th iteration is completed.
  • the method for determining the quantization parameters such as the data bit width and point positions in each iteration interval can be referred to the above description, and will not be repeated here.
  • the first preset iteration input by the user is the 100th iteration, and the preset iteration interval is 1.
  • the quantization parameter can be updated according to the preset iteration interval. That is, the processor can determine to update the quantization parameter from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network. Specifically, the processor may determine quantization parameters such as the data bit width n1 and point position s1 corresponding to the first iteration, and use the data bit width n1 and point position s1 to quantize the data to be quantized in the first iteration .
  • the processor can determine quantization parameters such as the data bit width n2 and point position s2 corresponding to the second iteration, and use the data bit width n2 and point position s2 to quantize the data to be quantized in the second iteration. .
  • the processor can determine quantization parameters such as the data bit width n100 and point position s100 of the 100th iteration, and use the data bit width n100 and point position s100 to quantize the data to be quantized in the 100th iteration .
  • the method for determining the quantization parameters such as the data bit width and point positions in each iteration interval can be referred to the above description, and will not be repeated here.
  • the processor may also determine the iteration interval of the point position according to the variation range of the point position , And update the quantization parameters such as the point position according to the point position iteration interval.
  • the current inspection iteration when the current inspection iteration is greater than the first preset iteration, it can indicate that the training or fine-tuning of the recurrent neural network is in the mid-stage.
  • the data change range of the data to be quantified in the historical iteration can be obtained, and the data to be quantified can be obtained according to the data to be quantified.
  • the magnitude of the data change determines the first target iteration interval, and the first target iteration interval may be greater than the above-mentioned preset iteration interval, thereby reducing the number of times of updating the quantization parameter and improving the quantization efficiency and computing efficiency.
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • the first preset iteration input by the user is the 100th iteration
  • the preset iteration interval is 1.
  • the quantization parameter can be updated according to the preset iteration interval. That is, the processor can determine to update the quantization parameter in each iteration from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network.
  • the processor can determine the data change range of the data to be quantified according to the data to be quantified in the current inspection iteration and the data to be quantified in the previous historical iteration, and based on the data to be quantified The magnitude of change determines the first target iteration interval. Specifically, when the current inspection iteration is greater than the 100th iteration, the processor can adaptively adjust the data bit width corresponding to the current inspection iteration, obtain the target data bit width corresponding to the current inspection iteration, and change the data bit width corresponding to the current inspection iteration.
  • the target data bit width is taken as the data bit width of the first target iteration interval, where the data bit widths corresponding to the iterations in the first target iteration interval are consistent.
  • the processor may determine the point position corresponding to the current inspection iteration according to the target data bit width corresponding to the current inspection iteration and the data to be quantified, and determine the first error according to the point position corresponding to the current inspection iteration.
  • the processor may also determine the quantization error according to the data to be quantized corresponding to the current inspection iteration, and determine the second error according to the quantization error.
  • the processor may determine the first target iteration interval according to the first error and the second error, and the first target iteration interval may be greater than the aforementioned preset iteration interval. Further, the processor may determine quantization parameters such as point positions or scaling coefficients in the first target iteration interval, and the specific determination method may refer to the above description.
  • the processor may determine that the first target iteration interval includes 3 iterations. These are the 100th iteration, the 101st iteration and the 102nd iteration.
  • the processor may also determine the quantization error according to the data to be quantized in the 100th iteration, and determine the second error and the target data bit width corresponding to the 100th iteration according to the quantization error, and use the target data bit width as the first target iteration interval to correspond to
  • the data bit width of the 100th iteration, the 101th iteration, and the 102th iteration are all the target data bit widths corresponding to the 100th iteration.
  • the processor may also determine quantization parameters such as point positions and scaling factors corresponding to the 100th iteration according to the data to be quantized in the 100th iteration and the target data bit width corresponding to the 100th iteration. After that, the quantization parameters corresponding to the 100th iteration are used to quantize the 100th iteration, the 101st iteration, and the 102nd iteration.
  • operation S200 may further include:
  • the second target iteration interval and the total number of iterations in each cycle are used to determine the second inspection iteration corresponding to the current inspection iteration.
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the processor may further determine whether the current inspection iteration is greater than the second preset iteration.
  • the second preset iteration is greater than the first preset iteration
  • the second preset iteration interval is greater than the preset iteration interval.
  • the foregoing second preset iteration may be a hyperparameter, and the second preset iteration may be greater than the total number of iterations in at least one cycle.
  • the second preset iteration may be determined according to the data variation curve of the data to be quantified.
  • the second preset iteration may also be customized by the user.
  • determining the second target iteration interval corresponding to the current inspection iteration according to the first target iteration interval and the total number of iterations in each cycle includes:
  • the second target iteration interval is determined according to the first target iteration interval, the sequenced number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the first cycle iteration iter 1 at t 2 corresponding to a second period may be iteratively updated iter 2 t 1 in the first cycle iteration 1 iteration iter T 1. It is determined in the t 2 iteration of the first cycle iter 1 that the quantization parameter needs to be updated.
  • the iterative sequence number 3 of the t 2 iteration of the first cycle iter 1 is greater than the total number of iterations of the second cycle, the t 2 iteration of the first cycle iter 1 The corresponding next update iteration will become the t 2 iteration in the third cycle iter 3 .
  • the t 3 iteration of the first cycle iter 1 it is determined that the quantization parameter needs to be updated. Since the iterative sequence number 4 of the t 2 iteration of the first cycle iter 1 is greater than the total number of iterations in the second and third cycles, then the first cycle iter 1 The next update iteration corresponding to the t 3 iteration of will become the t 3 iteration in the fourth cycle iter 4 .
  • the processor can update the quantization parameter and the first target iteration interval according to the preset iteration interval and the second target iteration interval.
  • the second target iteration interval is called the reference iteration interval or the target iteration interval.
  • the processor can By determining the quantization parameters such as the point positions in the reference iteration interval, the purpose of adjusting the quantization parameters in the cyclic neural network operation according to the reference iteration interval is achieved. Wherein, the quantization parameters corresponding to the iterations in the reference iteration interval may be consistent.
  • each iteration in the reference iteration interval uses the same point position, and only the quantitative parameters such as the determined point position are updated at each inspection iteration, which can avoid updating and adjusting the quantization parameters in each iteration, thereby reducing The amount of calculation in the quantization process improves the efficiency of the quantization operation.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the reference iteration interval For the corresponding point position, the point position corresponding to the current inspection iteration is used for the iterations in the reference iteration interval.
  • the target data bit width corresponding to the current inspection iteration may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration is customized by the user.
  • the point position corresponding to the current inspection iteration can be calculated by referring to the above formula (2-2) or formula (2-14).
  • the data bit width corresponding to each iteration of the cyclic neural network operation may change, that is, the data bit width corresponding to different reference iteration intervals may be inconsistent, but the data bit width of each iteration in the reference iteration interval constant.
  • the data bit width corresponding to the iteration in the reference iteration interval may be a hyperparameter.
  • the data bit width corresponding to the iteration in the reference iteration interval may be user-defined input.
  • the data bit width corresponding to the iteration in the reference iteration interval may also be calculated by the processor.
  • the processor may determine the target data bit width corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration. , And use the target data bit width corresponding to the current inspection iteration as the data bit width corresponding to the reference iteration interval.
  • the quantization parameters such as the corresponding point positions in the reference iteration interval may also remain unchanged. That is to say, each iteration in the reference iteration interval uses the same point position, and only the quantitative parameters such as the determined point position and the data bit width are updated at each inspection iteration, so as to avoid updating and adjusting the quantization parameters in each iteration. , Thereby reducing the amount of calculation in the quantization process and improving the efficiency of the quantization operation.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the reference iteration interval For the corresponding point position, the point position corresponding to the current inspection iteration is used for the iterations in the reference iteration interval.
  • the target data bit width corresponding to the current inspection iteration may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration is customized by the user.
  • the point position corresponding to the current inspection iteration can be calculated by referring to the above formula (2-2) or formula (2-14).
  • the scaling factors corresponding to iterations in the reference iteration interval may be consistent.
  • the processor may determine the scaling factor corresponding to the current test iteration according to the to-be-quantized data of the current test iteration, and use the scaling factor corresponding to the current test iteration as the scaling factor of each iteration in the reference iteration interval. Wherein, the scaling factors corresponding to iterations in the reference iteration interval are consistent.
  • the offsets corresponding to the iterations in the reference iteration interval are consistent.
  • the processor may determine the offset corresponding to the current inspection iteration according to the to-be-quantized data of the current inspection iteration, and use the offset corresponding to the current inspection iteration as the offset of each iteration in the reference iteration interval. Further, the processor may also determine the minimum value and the maximum value among all the elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling factors. For details, please refer to the above description.
  • the offset corresponding to the iteration in the reference iteration interval is consistent.
  • the reference iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may be the initial iteration of the reference iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the reference iteration interval is 3 according to the data change range of the data to be quantified, and the processor can determine that the reference iteration interval includes 3 iterations, which are respectively the 100th iteration.
  • the processor can determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized and the target data bit width corresponding to the 100th iteration, and can use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 100th iteration, the 101st iteration and the 102th iteration are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 101st iteration and the 102nd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the reference iteration interval may also be the number of iterations calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may also be the termination iteration of the reference iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the reference iteration interval is 3 according to the data change range of the data to be quantified. Then the processor may determine that the reference iteration interval includes 3 iterations, which are respectively the 101st iteration, the 102nd iteration, and the 103rd iteration.
  • the processor can determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized and the target data bit width corresponding to the 100th iteration, and can use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 101st, 102nd and 103rd iterations are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 102nd iteration and the 103rd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the data bit widths and quantization parameters corresponding to each iteration in the same reference iteration interval are consistent, that is, the data bit widths, point positions, scaling factors, and offsets corresponding to each iteration in the same reference iteration interval are all the same. Keep it unchanged, so that during the training or fine-tuning process of the cyclic neural network, frequent adjustment of the quantization parameters of the data to be quantized can be avoided, and the amount of calculation in the quantization process can be reduced, thereby improving the quantization efficiency.
  • the quantization accuracy can be guaranteed.
  • FIGS. 3-15 show a flow chart of adjusting quantization parameters in a quantization parameter adjustment method according to an embodiment of the present disclosure.
  • the foregoing operation S300 may further include:
  • S310 Determine the data bit width corresponding to the reference iteration interval according to the to-be-quantized data of the current inspection iteration; wherein the data bit width corresponding to the iteration in the reference iteration interval is consistent.
  • the data bit width during the operation of the cyclic neural network is updated every other reference iteration interval.
  • the data bit width corresponding to the reference iteration interval may be the target data bit width of the current inspection iteration.
  • the target data bit width of the current inspection iteration please refer to operations S114 and S115 above, which will not be repeated here.
  • the reference iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may be the initial iteration of the reference iteration interval. For example, if the current inspection iteration is the 100th iteration, the processor determines that the reference iteration interval is 6, according to the data change range of the data to be quantified, and the processor can determine that the reference iteration interval includes 6 iterations, which are respectively the 100th iteration. Iterations to the 105th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 105th iterations, and there is no need for the 101st to 105th iterations Calculate the target data bit width, thereby reducing the amount of calculation and improving the quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the reference iteration interval and updating the data bit width are repeated.
  • the reference iteration interval may also be the number of iterations calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may also be the termination iteration of the reference iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the reference iteration interval is 6 according to the data variation range of the data to be quantified. Then the processor may determine that the reference iteration interval includes 6 iterations, which are respectively the 101st iteration to the 106th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration will be used from the 101st to 106th iterations, and there is no need to calculate the target from the 101st to 106th iterations.
  • Data bit width reduces the amount of calculation and improves quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the reference iteration interval and updating the data bit width are repeated.
  • the processor adjusts the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval, so as to adjust the quantification of point positions in the cyclic neural network operation. parameter.
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval may be a hyperparameter, for example, the point position iteration interval may be user-defined input.
  • the point position iteration interval is less than or equal to the reference iteration interval.
  • the processor can synchronously update the quantization parameters such as the data bit width and the point position at the current inspection iteration.
  • the scaling factors corresponding to iterations in the reference iteration interval may be consistent.
  • the offset corresponding to the iteration in the reference iteration interval is consistent.
  • the quantization parameters such as the data bit width and point positions corresponding to the iterations in the reference iteration interval are all the same, so that the amount of calculation can be reduced, and the quantization efficiency and computing efficiency can be improved.
  • the specific implementation process is basically the same as the foregoing embodiment, and may refer to the above description, which will not be repeated here.
  • the processor can update the data bit width and point position and other quantitative parameters at the inspection iteration corresponding to the reference iteration interval, and update at the sub-inspection iteration determined by the point position iteration interval Quantitative parameters such as point position. Since the quantization parameters such as the point position can be fine-tuned according to the data to be quantized when the data bit width is unchanged, the quantization parameters such as the point position can also be adjusted within the same reference iteration interval to further improve the quantization accuracy.
  • the processor may determine a sub-inspection iteration according to the current inspection iteration and the point position iteration interval, the sub-inspection iteration is used to adjust the point position, and the sub-inspection iteration may be an iteration in the reference iteration interval. Further, the processor may adjust the position of the point corresponding to the iteration in the reference iteration interval according to the data to be quantified in the sub-test iteration and the data bit width corresponding to the reference iteration interval.
  • the way to determine the point position may refer to the above formula (2- 2) or formula (2-14), which will not be repeated here.
  • the current inspection iteration is the 100th iteration
  • the reference iteration interval is 6, and the reference iteration interval includes iterations from the 100th iteration to the 105th iteration.
  • the processor may use the 100th iteration as the above-mentioned sub-test iteration, and calculate the point position s1 corresponding to the 100th iteration.
  • the point position s1 is shared in the 100th iteration, the 101st iteration, and the 102nd iteration. s1 is quantified.
  • the processor can use the 103rd iteration as the aforementioned sub-test iteration according to the point position iteration interval I s1 , and the processor can also determine the data bit width n corresponding to the 103rd iteration and the reference iteration interval to be quantified.
  • the point position s2 corresponding to the second point position iteration interval can be quantified by sharing the aforementioned point position s2 from the 103rd iteration to the 105th iteration.
  • the values of the aforementioned point position s1 before update and the point position s2 after update may be the same or different.
  • the processor may determine the next reference iteration interval and the quantization parameters such as the data bit width and point position corresponding to the next reference iteration interval according to the data variation range of the data to be quantized again in the 106th iteration.
  • the current inspection iteration is the 100th iteration
  • the reference iteration interval is 6, and the reference iteration interval includes iterations from the 101st iteration to the 106th iteration.
  • the processor may determine that the point position corresponding to the first point position iteration interval is s1 according to the data to be quantified in the current inspection iteration and the target data bit width n1 corresponding to the current inspection iteration.
  • the second iteration and the 103th iteration share the above-mentioned point position s1 for quantization.
  • the processor can use the 104th iteration as the aforementioned sub-test iteration according to the point position iteration interval Is1 , and the processor can also determine the data bit width n1 corresponding to the 104th iteration and the reference iteration interval to be quantified.
  • the point position s2 corresponding to the second point position iteration interval can be quantified by sharing the aforementioned point position s2 from the 104th iteration to the 106th iteration.
  • the values of the aforementioned point position s1 before update and the point position s2 after update may be the same or different.
  • the processor may determine the next reference iteration interval and the quantization parameters such as the data bit width and point position corresponding to the next reference iteration interval according to the data variation range of the data to be quantized again in 106 iterations.
  • the point position iteration interval may be equal to 1, that is, the point position is updated once for each iteration.
  • the point position iteration interval can be the same or different.
  • the iteration interval of at least one point position included in the reference iteration interval may increase sequentially.
  • the scaling factors corresponding to iterations in the reference iteration interval may also be inconsistent.
  • the scaling factor can be updated synchronously with the aforementioned point position, that is, the iteration interval corresponding to the scaling factor can be equal to the aforementioned point position iteration interval. That is, whenever the processor updates the position of the determined point, it will update the determined zoom factor accordingly.
  • the offset corresponding to the iteration in the reference iteration interval may also be inconsistent.
  • the offset can be updated synchronously with the aforementioned point position, that is, the iteration interval corresponding to the offset can be equal to the aforementioned point position iteration interval. That is, whenever the processor updates the position of the determined point, it will update the determined offset accordingly.
  • the offset can also be updated asynchronously with the aforementioned location position or data bit width, which is not specifically limited here.
  • the processor may also determine the minimum and maximum values of all elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling coefficients. For details, please refer to the above description.
  • the processor may comprehensively determine the data change range of the data to be quantized according to the change range of the point position and the data bit width of the data to be quantized, and determine the reference according to the data change range of the data to be quantized Iteration interval, where the reference iteration interval can be used to update the determined data bit width, that is, the processor can update the determined data bit width at the verification iteration of each reference iteration interval. Since the point position can reflect the accuracy of the fixed-point data, and the data bit width can reflect the data representation range of the fixed-point data, by integrating the change range of the point position and the data bit width change of the data to be quantized, it is possible to ensure that the quantized data can take into account the accuracy , Can also meet the data representation range.
  • FIG. 3-16 shows a flowchart of a method for determining a first target iteration interval in a parameter adjustment method of another embodiment of the present disclosure. As shown in FIG. 3-16, the above method may include:
  • the first error can represent the variation range of the point position
  • the variation range of the point position may represent the data variation range of the data to be quantized; specifically, the calculation method of the first error can be referred to operation S110 above I won’t repeat the description here.
  • the aforementioned second error may be determined according to the quantization error, and the second error is positively correlated with the aforementioned quantization error.
  • the foregoing operation S500 may include:
  • the second error is determined according to the quantization error, and the second error is positively correlated with the quantization error.
  • the quantized data of the current inspection iteration is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the initial data bit width.
  • quantization error determination method please refer to the description in operation S114 above, which is not repeated here.
  • the second error can be calculated according to the following formula:
  • diff update2 represents the aforementioned second error
  • diff bit represents the aforementioned quantization error
  • can be a hyperparameter
  • the processor may calculate the target error according to the first error and the second error, and determine the target iteration interval according to the target error.
  • the processor may determine a target iteration interval according to the target error, and the target iteration interval is negatively correlated with the target error. That is, the larger the target error, the smaller the target iteration interval.
  • the target error can also be determined according to the maximum value of the first error and the second error, and the weight of the first error or the second error is 0.
  • the foregoing operation S600 may include:
  • the first target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the first target iteration interval.
  • the processor may compare the magnitude of the first error diff update1 and the second error diff update2 , and when the first error diff update1 is greater than the second error diff update2 , the target error is equal to the first error diff update1 .
  • the target error is equal to the second error diff update2 .
  • the target error can be the first error diff update1 or the second error diff update2 . That is, the target error diff update can be determined according to the following formula:
  • diff update max(diff update1 , diff update2 ) formula (2-35)
  • diff update refers to the target error
  • diff update1 refers to the first error
  • diff update2 refers to the second error
  • the first target iteration interval can be determined as follows:
  • the first target iteration interval can be calculated according to the following formula:
  • I represents the target iteration interval
  • diff update represents the above-mentioned target error
  • ⁇ and ⁇ can be hyperparameters.
  • the data bit width is variable in the cyclic neural network operation, and the change trend of the data bit width can be measured by the second error.
  • the processor can determine the second target iteration interval and the data bit width corresponding to the iteration in the second target iteration interval, wherein the iteration corresponding to the second target iteration interval
  • the data bit width is consistent.
  • the processor may determine the data bit width corresponding to the second target iteration interval according to the to-be-quantized data of the current inspection iteration. That is to say, the data bit width during the operation of the cyclic neural network is updated every second target iteration interval.
  • the data bit width corresponding to the second target iteration interval may be the target data bit width of the current inspection iteration.
  • the target data bit width of the current inspection iteration please refer to operations S114 and S115 above, which will not be repeated here.
  • the second target iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may be the initial iteration of the second target iteration interval. For example, if the current inspection iteration is the 100th iteration, and the processor determines that the iteration interval of the second target iteration interval is 6, according to the data change range of the data to be quantified, the processor may determine that the second target iteration interval includes 6 iterations. Respectively from the 100th iteration to the 105th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 105th iterations, and there is no need for the 101st to 105th iterations Calculate the target data bit width, thereby reducing the amount of calculation and improving the quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the second target iteration interval and updating the data bit width are repeated.
  • the second target iteration interval may also be calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may also be the termination iteration of the second target iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the second target iteration interval is 6 according to the data variation range of the data to be quantified.
  • the processor may determine that the second target iteration interval includes 6 iterations, which are respectively the 101st iteration to the 106th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration will be used from the 101st to 106th iterations, and there is no need to calculate the target from the 101st to 106th iterations.
  • Data bit width reduces the amount of calculation and improves quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the target iteration interval and updating the data bit width are repeated.
  • the processor may also determine the quantization parameter in the second target iteration interval at the verification iteration, so as to adjust the quantization parameter in the cyclic neural network operation according to the second target iteration interval. That is, the quantization parameters such as the point position in the cyclic neural network operation can be updated synchronously with the data bit width.
  • the quantization parameters corresponding to the iterations in the second target iteration interval may be consistent.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the second The position of the point corresponding to the target iteration interval, wherein the position of the point corresponding to the iteration in the second target iteration interval is consistent.
  • each iteration in the second target iteration interval uses the quantization parameters such as the point position of the current inspection iteration, which avoids updating and adjusting the quantization parameters in each iteration, thereby reducing the amount of calculation in the quantization process and improving To quantify the efficiency of the operation.
  • the scaling factors corresponding to iterations in the second target iteration interval may be consistent.
  • the processor may determine the scaling factor corresponding to the current testing iteration according to the data to be quantized in the current testing iteration, and use the scaling factor corresponding to the current testing iteration as the scaling factor of each iteration in the second target iteration interval. Wherein, the scaling factors corresponding to iterations in the second target iteration interval are consistent.
  • the offsets corresponding to the iterations in the second target iteration interval are consistent.
  • the processor may determine the offset corresponding to the current inspection iteration according to the to-be-quantized data of the current inspection iteration, and use the offset corresponding to the current inspection iteration as the offset of each iteration in the second target iteration interval. Further, the processor may also determine the minimum value and the maximum value among all the elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling factors. For details, please refer to the above description.
  • the offsets corresponding to the iterations in the second target iteration interval are consistent.
  • the second target iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may be the initial iteration of the second target iteration interval. For example, if the current inspection iteration is the 100th iteration, and the processor determines that the iteration interval of the second target iteration interval is 3 according to the data variation range of the data to be quantified, the processor may determine that the second target iteration interval includes 3 iterations. These are the 100th iteration, the 101st iteration and the 102nd iteration.
  • the processor can determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized and the target data bit width corresponding to the 100th iteration, and can use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 100th iteration, the 101st iteration and the 102th iteration are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 101st iteration and the 102nd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the second target iteration interval may also be calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may also be the termination iteration of the second target iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the second target iteration interval is 3 according to the data variation range of the data to be quantified. Then the processor may determine that the second target iteration interval includes 3 iterations, which are respectively the 101st iteration, the 102nd iteration, and the 103rd iteration.
  • the processor can determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized and the target data bit width corresponding to the 100th iteration, and can use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 101st, 102nd and 103rd iterations are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 102nd iteration and the 103rd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the data bit widths and quantization parameters corresponding to each iteration in the same second target iteration interval are consistent, that is, the data bit widths, point positions, scaling coefficients, and scaling factors corresponding to each iteration in the same second target iteration interval
  • the offset remains unchanged, so that during the training or fine-tuning process of the recurrent neural network, frequent adjustment of the quantization parameters of the data to be quantized can be avoided, the calculation amount in the quantization process is reduced, and the quantization efficiency can be improved.
  • the quantization accuracy can be guaranteed.
  • the processor may also determine the quantization parameter in the second target iteration interval according to the point position iteration interval corresponding to the quantization parameter such as the point position, so as to adjust the quantization parameter in the cyclic neural network operation. That is, the quantitative parameters such as the point position in the cyclic neural network operation can be updated asynchronously with the data bit width, and the processor can update the quantitative parameters such as the data bit width and the point position at the inspection iteration of the second target iteration interval, and the processor can also update according to The point position iteration interval separately updates the point position corresponding to the iteration in the second target iteration interval.
  • the processor may also determine the data bit width corresponding to the second target iteration interval according to the target data bit width corresponding to the current inspection iteration, where the data bit widths corresponding to the iterations in the second target iteration interval are consistent. After that, the processor can adjust the quantitative parameters such as the point position in the cyclic neural network operation process according to the data bit width and the point position iteration interval corresponding to the second target iteration interval. After determining the data bit width corresponding to the second target iteration interval, adjust the point position corresponding to the iteration in the second target iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the second target iteration interval, To adjust the point position in the recurrent neural network operation.
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval may be a hyperparameter, for example, the point position iteration interval may be user-defined input.
  • the above-mentioned method can be used in the training or fine-tuning process of the cyclic neural network to realize the adjustment of the quantized parameters of the operation data involved in the fine-tuning of the cyclic neural network or the training process to improve the cyclic neural network.
  • the operation data may be at least one of neuron data, weight data or gradient data.
  • the training or fine-tuning of the cyclic neural network tends to be stable (that is, when the forward operation result of the cyclic neural network approaches the preset reference value, the training or fine-tuning of the cyclic neural network tends to Stable), at this time, you can continue to increase the value of the target iteration interval to further improve the quantization efficiency and computing efficiency.
  • different methods can be used to determine the target iteration interval at different stages of the training or fine-tuning of the recurrent neural network, so as to improve the quantization efficiency and computing efficiency on the basis of ensuring the quantization accuracy.
  • FIG. 3-17 shows a flowchart of a method for adjusting a quantization parameter according to another embodiment of the present disclosure. As shown in FIG. 3-17, the above method may further include:
  • the processor may further perform operation S712, that is, the processor may further determine whether the current iteration is greater than the second preset iteration.
  • the second preset iteration is greater than the first preset iteration
  • the second preset iteration interval is greater than the first preset iteration interval.
  • the foregoing second preset iteration may be a hyperparameter, and the second preset iteration may be greater than the total number of iterations of at least one training period.
  • the second preset iteration may be determined according to the data variation curve of the data to be quantified.
  • the second preset iteration may also be customized by the user.
  • the processor may perform operation S714, use the second preset iteration interval as the target iteration interval, and adjust the second preset iteration interval according to the second preset iteration interval.
  • the parameters of the neural network quantification process When the current iteration is greater than the first preset iteration and the current iteration is less than the second preset iteration, the processor may perform the above-mentioned operation S713, and determine the target iteration interval according to the data variation range of the data to be quantized, and according to the The target iteration interval adjusts the quantization parameter.
  • the processor may read the second preset iteration set by the user, and determine the second preset iteration interval according to the corresponding relationship between the second preset iteration and the second preset iteration interval, the second preset iteration The interval is greater than the first preset iteration interval.
  • the degree of convergence of the neural network satisfies a preset condition
  • the forward operation result of the current iteration approaches the preset reference value
  • the current iteration is greater than or equal to the second preset iteration.
  • the loss value corresponding to the current iteration is less than or equal to the preset threshold, it can be determined that the degree of convergence of the neural network meets the preset condition.
  • the aforementioned second preset iteration interval may be a hyperparameter, and the second preset iteration interval may be greater than or equal to the total number of iterations of at least one training period.
  • the second preset iteration interval may be customized by the user.
  • the processor can directly read the second preset iteration and the second preset iteration interval input by the user, and update the quantization parameter in the neural network operation according to the second preset iteration interval.
  • the second preset iteration interval may be equal to the total number of iterations of one training period, that is, the quantization parameter is updated once every training period (epoch).
  • the above method also includes:
  • the processor may also determine whether the current data bit width needs to be adjusted at each inspection iteration. If the current data bit width needs to be adjusted, the processor may switch from the above-mentioned operation S714 to operation S713 to re-determine the data bit width so that the data bit width can meet the requirements of the data to be quantized.
  • the processor may determine whether the data bit width needs to be adjusted according to the aforementioned second error.
  • the processor may also perform the above operation S715 to determine whether the second error is greater than a preset error value, and when the current iteration is greater than or equal to the second preset iteration and the second error is greater than the preset error value, switch to perform the operation S713: Determine an iteration interval according to the data variation range of the data to be quantized, so as to re-determine the data bit width according to the iteration interval.
  • the preset error value may be determined according to the preset threshold corresponding to the quantization error.
  • the processor may be based on the data to be quantized The data variation range of the determines the iteration interval, so as to re-determine the data bit width according to the iteration interval.
  • the second preset iteration interval is the total number of iterations in one training period.
  • the processor may update the quantization parameter according to the second preset iteration interval, that is, the quantization parameter is updated once every training period (epoch).
  • the initial iteration of each training period is regarded as a test iteration.
  • the processor can determine the quantization error according to the data to be quantized in the test iteration, and determine the second error according to the quantization error, And determine whether the second error is greater than the preset error value according to the following formula:
  • diff update2 represents the second error
  • diff bit represents the quantization error
  • represents the hyperparameter
  • T represents the preset error value
  • the preset error value may be equal to the first preset threshold divided by the hyperparameter.
  • the preset error value may also be a hyperparameter.
  • the second error diff update2 is greater than the preset error value T, it means that the data bit width may not meet the preset requirements. At this time, the second preset iteration interval may no longer be used to update the quantization parameters, and the processor may follow the data to be quantized.
  • the range of data change determines the target iteration interval to ensure that the data bit width meets the preset requirements. That is, when the second error diff update2 is greater than the preset error value T, the processor switches from the aforementioned operation S714 to the aforementioned operation S713.
  • the processor may determine whether the data bit width needs to be adjusted according to the aforementioned quantization error.
  • the second preset iteration interval is the total number of iterations in one training period.
  • the processor may update the quantization parameter according to the second preset iteration interval, that is, the quantization parameter is updated once every training period (epoch).
  • the initial iteration of each training cycle is used as a test iteration.
  • the processor can determine the quantization error according to the data to be quantized in the test iteration, and when the quantization error is greater than or equal to the first preset threshold, it means that the data bit width may not meet the preset threshold. It is assumed that the processor switches from the above operation S714 to the above operation S713.
  • the aforementioned quantization parameters such as the position of the point, the scaling factor, and the offset may be displayed by a display device.
  • the user can learn the quantization parameter during the operation of the cyclic neural network through the display device, and the user can also adaptively modify the quantization parameter determined by the processor.
  • the aforementioned data bit width and target iteration interval can also be displayed by the display device.
  • the user can learn the parameters such as the target iteration interval and data bit width during the operation of the cyclic neural network through the display device, and the user can also adaptively modify the target iteration interval and data bit width determined by the processor.
  • An embodiment of the present disclosure also provides a quantization parameter adjustment device 200 of a cyclic neural network.
  • the quantization parameter adjustment device 200 may be installed in a processor.
  • the quantization parameter adjustment device 200 can be placed in a general-purpose processor.
  • the quantization parameter adjustment device can also be placed in an artificial intelligence processor.
  • 3-18 shows a structural block diagram of a quantization parameter adjustment device according to an embodiment of the present disclosure. As shown in FIG. 3-18, the device includes an acquisition module 210 and an iteration interval determination module 220.
  • the obtaining module 210 is used to obtain the data change range of the data to be quantified
  • the iteration interval determination module 220 is configured to determine a first target iteration interval according to the data variation range of the data to be quantized, so as to adjust the quantization parameter in the cyclic neural network operation according to the first target iteration interval, wherein The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the device further includes:
  • the preset interval determination module is configured to adjust the quantization parameter according to the preset iteration interval when the current inspection iteration is less than or equal to the first preset iteration.
  • the iteration interval determination module is further configured to determine the first target iteration interval according to the data variation range of the data to be quantified when the current inspection iteration is greater than the first preset iteration.
  • the iteration interval determination module includes:
  • the second target iteration interval determination sub-module when the current inspection iteration is greater than or equal to the second preset iteration, and the current inspection iteration requires quantitative parameter adjustment, it is determined according to the first target iteration interval and the total number of iterations in each cycle.
  • the second target iteration interval corresponding to the current inspection iteration;
  • the update iteration determination sub-module determines the update iteration corresponding to the current inspection iteration according to the second target iteration interval, so as to adjust the quantitative parameter in the update iteration, and the update iteration is the current inspection Iteration after iteration;
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the second target iteration interval determination submodule includes:
  • the update cycle determination sub-module determines the update cycle corresponding to the current inspection iteration according to the number of iterations in the current cycle of the current inspection iteration and the total number of iterations in the cycles after the current cycle.
  • the total number is greater than or equal to the iterative sort number
  • the determining sub-module determines the second target iteration interval according to the first target iteration interval, the number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the iteration interval determination module is further configured to determine that the current inspection iteration is greater than or equal to a second preset iteration when the degree of convergence of the cyclic neural network meets a preset condition.
  • the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the quantization parameter determination module is used to determine the point position corresponding to the iteration in the reference iteration interval according to the target data bit width corresponding to the current inspection iteration and the to-be-quantized data of the current inspection iteration to adjust the points in the cyclic neural network operation position;
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the data bit width determining module is configured to determine the data bit width corresponding to the reference iteration interval according to the target data bit width corresponding to the current inspection iteration, wherein the data bit width corresponding to the iteration in the reference iteration interval is consistent, and the reference The iteration interval includes the second target iteration interval or the preset iteration interval;
  • the quantization parameter determination module is configured to adjust the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval to adjust the point position in the neural network operation;
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval is less than or equal to the reference iteration interval.
  • the quantization parameter further includes a scaling factor, and the scaling factor is updated synchronously with the point position.
  • the quantization parameter further includes an offset, and the offset is updated synchronously with the point position.
  • the data bit width determination module includes:
  • the quantization error determination sub-module is used to determine the quantization error according to the quantized data of the current inspection iteration and the quantized data of the current inspection iteration, wherein the quantitative data of the current inspection iteration compares Quantitative data is obtained quantitatively;
  • the data bit width determination sub-module is used to determine the target data bit width corresponding to the current inspection iteration according to the quantization error.
  • the data bit width determining unit is configured to determine the target data bit width corresponding to the current inspection iteration according to the quantization error, specifically:
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • the data bit width determination unit is configured to increase the data bit width corresponding to the current inspection iteration if the quantization error is greater than or equal to a first preset threshold to obtain the current inspection
  • a first preset threshold to obtain the current inspection
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • the data bit width determining unit is configured to, if the quantization error is less than or equal to a second preset threshold, reduce the data bit width corresponding to the current inspection iteration to obtain the current
  • a second preset threshold reduces the data bit width corresponding to the current inspection iteration to obtain the current
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is greater than the second preset threshold; wherein the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the acquisition module includes:
  • the first acquisition module is used to acquire the variation range of the point position; wherein the variation range of the point position can be used to characterize the data variation range of the data to be quantized, and the variation range of the point position is related to the data to be quantified.
  • the data changes are positively correlated.
  • the first obtaining module includes:
  • the first mean value determining unit is configured to determine the first mean value according to the point position corresponding to the previous test iteration before the current test iteration and the point position corresponding to the historical iteration before the last test iteration, wherein the last The inspection iteration is the inspection iteration corresponding to the previous iteration interval before the target iteration interval;
  • the second mean value determining unit is configured to determine the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration; wherein the point position corresponding to the current inspection iteration is based on the The target data bit width corresponding to the current inspection iteration and the data to be quantified are determined;
  • the first error determining unit is configured to determine a first error according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • the second average value determining unit is specifically configured to:
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • the second average value determining unit is specifically configured to determine the second average value according to a point position corresponding to the current inspection iteration and the first average value.
  • the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration
  • the data bit width adjustment value of the current inspection iteration is determined according to the target data bit width and the initial data bit width of the current inspection iteration.
  • the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration, specifically:
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • the iteration interval determination module is configured to determine the target iteration interval according to the first error, and the target iteration interval is negatively correlated with the first error.
  • the acquisition module further includes:
  • the second acquisition module is used to acquire the change trend of the data bit width; determine the data change range of the data to be quantified according to the change range of the point position and the change trend of the data bit width.
  • the iteration interval determination module is further configured to determine the target iteration interval according to the acquired first error and second error; wherein, the first error is used to characterize the change in point position Amplitude, the second error is used to characterize the changing trend of the data bit width.
  • the iteration interval determination module is configured to determine the target iteration interval according to the acquired first error and second error, specifically for:
  • the target iteration interval is determined according to the target error, wherein the target error is negatively related to the target iteration interval.
  • the second error is determined according to the quantization error
  • the quantization error is determined based on the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • the iteration interval determination module is further configured to: when the current inspection iteration is greater than or equal to the second preset iteration, and the second error is greater than the preset error value, then according to the waiting The data variation range of the quantified data determines the first target iteration interval.
  • each module or unit in the embodiment of the present application is basically the same as the implementation process of each operation in the foregoing method.
  • the division of the units/modules in the foregoing embodiment is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules, or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the above-mentioned integrated units/modules can be implemented in the form of hardware or software program modules. If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
  • the present disclosure also provides a computer-readable storage medium in which a computer program is stored.
  • a computer program is stored.
  • the method as in any of the above embodiments is implemented.
  • the computer program is executed by a processor or device, the following method is implemented:
  • the quantization parameter of the neural network is used to implement the quantization operation of the data to be quantized in the cyclic neural network operation.
  • Clause B1 A method for adjusting quantitative parameters of a recurrent neural network, the method comprising:
  • the quantization parameter of the cyclic neural network is used to implement the quantization operation of the data to be quantized in the operation of the cyclic neural network.
  • Clause B2 The method according to Clause B1, the method further comprising:
  • the quantization parameter is adjusted according to the preset iteration interval.
  • determining the first target iteration interval according to the data change range of the data to be quantified includes:
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • a first target iteration interval is determined according to the data variation range of the data to be quantified, so as to adjust the recurrent neural network according to the first target iteration interval
  • the quantization parameters in the calculation include:
  • the second target iteration interval and the total number of iterations in each cycle are used to determine the second inspection iteration corresponding to the current inspection iteration.
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • determining the second target iteration interval corresponding to the current inspection iteration according to the first target iteration interval and the total number of iterations in each cycle includes:
  • the second target iteration interval is determined according to the first target iteration interval, the sequenced number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the first target iteration interval is determined according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, Also includes:
  • Clause B7 The method according to clause B4, wherein the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the method further includes:
  • the target data bit width corresponding to the current inspection iteration and the data to be quantified in the current inspection iteration determine the point position corresponding to the iteration in the reference iteration interval to adjust the point position in the cyclic neural network operation
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • Clause B8 The method according to clause B4, wherein the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the method further includes:
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • Clause B9 The method according to clause B8, wherein the point position iteration interval is less than or equal to the reference iteration interval.
  • Clause B10 The method according to any one of clauses B7 to B9, wherein the quantization parameter further includes a scaling factor, and the scaling factor is updated synchronously with the point position.
  • Clause B11 The method according to any one of clauses B7 to B9, wherein the quantization parameter further includes an offset, and the offset is updated synchronously with the point position.
  • Clause B12 The method according to any one of clauses B7 to B9, the method further comprising:
  • the target data bit width corresponding to the current inspection iteration is determined.
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • Clause B14 According to the method of clause B13, if the quantization error is greater than or equal to the first preset threshold, increase the data bit width corresponding to the current inspection iteration to obtain the target data bit width corresponding to the current inspection iteration ,include:
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • Clause B15 The method according to clause B13, wherein if the quantization error is less than or equal to a second preset threshold, reducing the data bit width corresponding to the current inspection iteration includes:
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is greater than the second preset threshold; wherein the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the obtaining the data variation range of the data to be quantified includes:
  • the variation range of the point position can be used to characterize the data variation range of the data to be quantized, and the variation range of the point position is positively correlated with the data variation range of the data to be quantized.
  • the variation range of the position of the acquisition point includes:
  • a first error is determined according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • determining the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration including:
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • the second average value is determined according to the point position corresponding to the current inspection iteration and the first average value.
  • Clause B20 The method according to Clause B17, the method further comprising:
  • the second average value is updated; wherein, the data bit width adjustment value of the current inspection iteration is based on the target data bit width and the initial data bit of the current inspection iteration. Wide ok.
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • the first target iteration interval is determined according to the first error, and the first target iteration interval is negatively correlated with the first error.
  • said obtaining the data variation range of the data to be quantified further includes:
  • the data change range of the data to be quantized is determined.
  • determining the first target iteration interval according to the data variation range of the data to be quantified further includes:
  • the acquired first error and second error determine the first target iteration interval; wherein, the first error is used to characterize the range of change of the point position, and the second error is used to characterize the change trend of the data bit width .
  • determining the first target iteration interval according to the acquired first error and second error includes:
  • the first target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the first target iteration interval.
  • Clause B26 According to the method described in Clause B24 or Clause B25, the second error is determined according to the quantization error
  • the quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • Clause B27 The method according to Clause B4, the method further comprising:
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • Clause B28 The method according to any one of clause B1 to clause B27, wherein the data to be quantified is at least one of neuron data, weight data, or gradient data.
  • a quantization parameter adjustment device of a recurrent neural network comprising a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the implementation is as described in any one of clauses B1 to B28 The steps of the method described.
  • Clause B30 A computer-readable storage medium with a computer program stored in the computer-readable storage medium, which, when executed, realizes the steps of the method described in any one of clauses B1 to B28.
  • a quantitative parameter adjustment device of a recurrent neural network comprising:
  • the obtaining module is used to obtain the data change range of the data to be quantified
  • the iteration interval determination module is configured to determine a first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein the The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • Clause B32 The device according to Clause B31, the device further comprising:
  • the preset interval determination module is configured to adjust the quantization parameter according to the preset iteration interval when the current inspection iteration is less than or equal to the first preset iteration.
  • the iteration interval determination module is further configured to determine the first target iteration interval according to the data variation range of the data to be quantified when the current inspection iteration is greater than the first preset iteration.
  • Clause B34 The device according to any one of clauses B31 to B33, wherein the iteration interval determination module includes:
  • the second target iteration interval determination sub-module when the current inspection iteration is greater than or equal to the second preset iteration, and the current inspection iteration requires quantitative parameter adjustment, it is determined according to the first target iteration interval and the total number of iterations in each cycle.
  • the second target iteration interval corresponding to the current inspection iteration;
  • the update iteration determination sub-module determines the update iteration corresponding to the current inspection iteration according to the second target iteration interval, so as to adjust the quantitative parameter in the update iteration, and the update iteration is the current inspection Iteration after iteration;
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the update cycle determination sub-module determines the update cycle corresponding to the current inspection iteration according to the number of iterations in the current cycle of the current inspection iteration and the total number of iterations in the cycles after the current cycle.
  • the total number is greater than or equal to the iterative sort number
  • the determining sub-module determines the second target iteration interval according to the first target iteration interval, the number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the iteration interval determination module is further configured to determine that the current inspection iteration is greater than or equal to a second preset iteration when the degree of convergence of the cyclic neural network meets a preset condition.
  • Clause B37 The device according to clause B34, wherein the quantization parameter includes a point position, and the point position is a position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the quantization parameter determination module is used to determine the point position corresponding to the iteration in the reference iteration interval according to the target data bit width corresponding to the current inspection iteration and the to-be-quantized data of the current inspection iteration to adjust the points in the cyclic neural network operation position;
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • Clause B38 The device according to Clause B34, wherein the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the data bit width determining module is configured to determine the data bit width corresponding to the reference iteration interval according to the target data bit width corresponding to the current inspection iteration, wherein the data bit width corresponding to the iteration in the reference iteration interval is consistent, and the reference The iteration interval includes the second target iteration interval or the preset iteration interval;
  • the quantization parameter determination module is configured to adjust the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval to adjust the point position in the neural network operation;
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • Clause B40 The device according to any one of clauses B37 to B39, wherein the quantization parameter further includes a scaling factor, and the scaling factor is updated synchronously with the point position.
  • Clause B41 The device according to any one of clauses B37 to B39, wherein the quantization parameter further includes an offset, and the offset is updated synchronously with the point position.
  • Clause B42 The device according to any one of clauses B37 to B39, wherein the data bit width determination module includes:
  • the quantization error determination sub-module is used to determine the quantization error according to the quantized data of the current inspection iteration and the quantized data of the current inspection iteration, wherein the quantitative data of the current inspection iteration compares Quantitative data is obtained quantitatively;
  • the data bit width determination sub-module is used to determine the target data bit width corresponding to the current inspection iteration according to the quantization error.
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • Clause B44 The device according to clause B43, wherein the data bit width determining unit is configured to, if the quantization error is greater than or equal to a first preset threshold, increase the data bit width corresponding to the current inspection iteration to obtain the When the target data bit width corresponding to the current inspection iteration, it is specifically used for:
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • Clause B45 The device according to clause B43, wherein the data bit width determining unit is configured to reduce the data bit width corresponding to the current inspection iteration if the quantization error is less than or equal to a second preset threshold to obtain When describing the target data bit width corresponding to the current inspection iteration, it is specifically used for:
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is greater than the second preset threshold; wherein the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the first acquisition module is used to acquire the variation range of the point position; wherein the variation range of the point position can be used to characterize the data variation range of the data to be quantized, and the variation range of the point position is related to the data to be quantified.
  • the data changes are positively correlated.
  • Clause B47 The device according to clause B46, wherein the first obtaining module includes:
  • the first mean value determining unit is configured to determine the first mean value according to the point position corresponding to the previous test iteration before the current test iteration and the point position corresponding to the historical iteration before the last test iteration, wherein the last The inspection iteration is the inspection iteration corresponding to the previous iteration interval before the target iteration interval;
  • the second mean value determining unit is configured to determine the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration; wherein the point position corresponding to the current inspection iteration is based on the The target data bit width corresponding to the current inspection iteration and the data to be quantified are determined;
  • the first error determining unit is configured to determine a first error according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • Clause B48 The device according to clause B47, wherein the second mean value determining unit is specifically configured to:
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • Clause B49 The device according to clause B47, wherein the second average value determining unit is specifically configured to determine the second average value according to a point position corresponding to the current inspection iteration and the first average value.
  • Clause B50 The device according to clause B47, wherein the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration;
  • the data bit width adjustment value of the current inspection iteration is determined according to the target data bit width and the initial data bit width of the current inspection iteration.
  • Clause B51 The device according to clause B50, wherein the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration, specifically:
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • Clause B52 The device according to clause B47, wherein the iteration interval determination module is configured to determine the target iteration interval according to the first error, and the target iteration interval is negatively related to the first error.
  • the second acquisition module is used to acquire the change trend of the data bit width; determine the data change range of the data to be quantified according to the change range of the point position and the change trend of the data bit width.
  • Clause B54 The device according to clause B53, wherein the iteration interval determination module is further configured to determine the target iteration interval according to the acquired first error and second error; wherein, the first error is used to characterize a point position The second error is used to characterize the change trend of the data bit width.
  • Clause B55 The device according to clause B53, wherein the iteration interval determination module is configured to determine the target iteration interval according to the acquired first error and second error, specifically for:
  • the target iteration interval is determined according to the target error, wherein the target error is negatively related to the target iteration interval.
  • Clause B56 The device according to clause B54 or clause 55, wherein the second error is determined according to a quantization error
  • the quantization error is determined based on the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • the iteration interval determination module is further configured to determine the first inspection iteration according to the data variation range of the data to be quantified when the current inspection iteration is greater than or equal to the second preset iteration and the second error is greater than the preset error value. Target iteration interval.
  • the data to be calculated in the neural network is usually in the floating-point data format or the fixed-point data format with higher precision.
  • the floating-point data format or the fixed-point data format with higher precision is various to be calculated Data, which leads to a large amount of computation and memory access overhead for neural network operation.
  • the neural network quantization method, device, computer equipment and storage medium provided by the embodiments of the present disclosure can perform local quantization of the data to be calculated in the neural network according to different types of data to be calculated.
  • the quantized data format is usually It is a fixed-point data format with shorter bit width and lower precision. The use of lower-precision quantized data to perform neural network operations can reduce the amount of calculation and memory access.
  • the quantized data format can be a fixed-point data format with a shorter bit width.
  • the to-be-calculated data in the floating-point data format can be quantized into the to-be-calculated data in the fixed-point data format, and the to-be-calculated data in the fixed-point format with higher precision can be quantized into the data to be calculated in the fixed-point format with lower precision.
  • the size of the neural network model is reduced, and the performance requirements for the terminal running the neural network model are reduced, so that the neural network model can be applied to terminals such as mobile phones with relatively limited computing power, size, and power consumption.
  • the quantization accuracy refers to the size of the error between the quantized data and the pre-quantized data.
  • the quantization accuracy can affect the accuracy of the neural network operation result. The higher the conversion accuracy, the higher the accuracy of the calculation result, but the greater the amount of calculation and the greater the memory access overhead.
  • the quantized data with a longer bit width has a higher quantization accuracy, and is also more accurate when used to perform neural network operations.
  • quantization with a longer bit width requires more data operations, more memory access overhead, and lower operation efficiency.
  • the quantized data obtained by using different quantization parameters will have different quantization precisions, which will produce different quantization results, and will also have different effects on the calculation efficiency and accuracy of the calculation results.
  • the neural network is quantified to balance the calculation efficiency and the accuracy of the calculation results, and the quantized data bit width and quantization parameters that are more in line with the data characteristics of the data to be calculated can be used.
  • the data to be calculated in the neural network may include at least one of weight, neuron, bias, and gradient.
  • the data to be calculated is a matrix containing multiple elements.
  • traditional neural network quantification the whole data to be calculated is usually quantified and then operated.
  • When performing operations using quantized data to be operated on it is common to perform operations using part of the overall quantized data to be operated on.
  • the convolutional layer when the overall quantized input neuron is used for convolution operation, according to the dimension and step size of the convolution kernel, the dimensionality equivalent to the convolution kernel is extracted from the overall quantized input neuron. The quantized neuron undergoes convolution operation.
  • the quantized neurons are extracted row by row from the overall quantized input neurons to perform the matrix multiplication operation. Therefore, in the traditional neural network quantization method, the entire data to be calculated is quantized and then calculated according to the partially quantized data, and the overall calculation efficiency is low. In addition, quantizing the entire data to be calculated before performing calculations requires storing the entire quantized data to be calculated, which occupies a large storage space.
  • the neural network quantification method can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence for performing artificial intelligence operations.
  • processor processor
  • Artificial intelligence operations may include machine learning operations, brain-like operations, etc. Among them, machine learning operations include neural network operations, k-means operations, and support vector machine operations.
  • the artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks executed by the processing unit.
  • Figure 4-1 shows a flowchart of a neural network quantification method according to an embodiment of the present disclosure.
  • the method can be applied to any layer in the neural network, and the method includes steps S3-11 to S3-13.
  • This method can be applied to the processor 100 shown in FIG. 1.
  • the processing unit 101 is configured to execute step S3-11 to step S3-13.
  • the storage unit 102 is used to store data related to the processing procedures from step S3-11 to step S3-13, such as the data to be quantized, the quantization parameter, and the data bit width.
  • step S3-11 multiple data to be quantized are determined in the target data of the layer to be quantized, each of the data to be quantized is a subset of the target data, and the target data is the layer to be quantized Any of the data to be calculated to be quantified, and the data to be calculated includes at least one of input neurons, weights, biases, and gradients.
  • the layer to be quantified in the neural network can be any layer in the neural network. Some or all of the layers in the neural network can be determined as the layers to be quantified according to requirements. When the neural network includes multiple layers to be quantized, each layer to be quantized may be continuous or discontinuous. According to different neural networks, the types of layers to be quantized can also be different. For example, the layers to be quantized can be convolutional layers, fully connected layers, etc. The present disclosure does not limit the number and types of layers to be quantized.
  • the data to be calculated includes at least one of neuron, weight, bias, and gradient. At least one of neurons, weights, biases, and gradients in the layer to be quantified can be quantified according to requirements.
  • the target data is any data to be calculated to be quantified.
  • the data to be calculated are neurons, weights, and biases. If the neurons and weights need to be quantified, the neuron is target data 1, and the weight is target data 2.
  • the quantization method in the present disclosure can be used to quantify each target data to obtain the quantized data corresponding to each target data, and then use the quantized data and non-target data of various target data.
  • the to-be-calculated data that needs to be quantized performs the calculation of the to-be-quantized layer.
  • the reasoning stage of the neural network operation may include a stage in which the trained neural network is subjected to forward operation to complete the set task.
  • the inference stage of the neural network at least one of neurons, weights, biases, and gradients can be used as the data to be quantified. After quantification is performed according to the method in the embodiment of the present disclosure, the quantized data is used to complete the layer to be quantified. Operation.
  • the fine-tuning stage of the neural network operation may include: performing forward and backward operations of a preset number of iterations on the trained neural network, and fine-tuning the parameters to adapt to the stage of the set task.
  • at least one of neurons, weights, biases, and gradients can be quantified according to the method in the embodiment of the present disclosure, and then the quantized data is used to complete the forward direction of the layer to be quantized. Operation or reverse operation.
  • the training phase of the neural network operation may include: iterative training of the initialized neural network to obtain a trained neural network, and the trained neural network can perform specific tasks.
  • the training phase of the neural network at least one of neurons, weights, biases, and gradients can be quantized according to the method in the embodiment of the present disclosure, and then the quantized data is used to complete the forward operation of the layer to be quantized Or reverse operation.
  • a subset of the target data can be used as the data to be quantified, the target data can be divided into multiple subsets in different ways, and each subset can be used as the data to be quantified.
  • the target data can be divided into multiple data to be quantified according to the type of operation to be performed on the target data. For example, when the target data needs to be subjected to a convolution operation, the target data can be divided into multiple data to be quantized corresponding to the convolution kernel according to the height and width of the convolution kernel.
  • the target data is a left matrix that requires a matrix multiplication operation, the target data can be divided into multiple data to be quantized by rows.
  • the target data can be divided into multiple data to be quantized at one time, or the target data can be divided into multiple data to be quantized in sequence according to the order of operations.
  • the target data into multiple data may be quantified according to a preset data division method.
  • the preset data division method may be: division according to a fixed data size, or division according to a fixed data shape.
  • each data to be quantized can be quantized separately, and operations can be performed based on the quantized data of each data to be quantized.
  • the quantization time required for a data to be quantized is shorter than the overall quantization time of the target data.
  • the quantized data can be used to perform subsequent operations without waiting for all the target data. After the quantized data are all quantized, the calculation is performed. Therefore, the quantization method of target data in the present disclosure can improve the calculation efficiency of target data.
  • step S3-12 each of the data to be quantized is quantized according to the corresponding quantization parameter to obtain quantized data corresponding to each of the data to be quantized.
  • the quantization parameter corresponding to the data to be quantized may be one quantization parameter or multiple quantization parameters.
  • the quantization parameters may include parameters used for quantizing the data to be quantized, such as point positions.
  • the point position can be used to determine the position of the decimal point in the quantized data.
  • the quantization parameters can also include scaling factors, offsets, and so on.
  • the manner of determining the quantization parameter corresponding to the data to be quantized may include a manner of determining the quantization parameter corresponding to the target data as the quantization parameter of the data to be quantized after determining the quantization parameter corresponding to the target data.
  • each target data may have a corresponding quantization parameter, and the quantization parameter corresponding to each target data may be different or the same, which is not limited in the present disclosure.
  • the quantization parameter corresponding to the target data can be determined as the quantization parameter corresponding to each data to be quantized. At this time, the quantization parameter corresponding to each data to be quantized is the same.
  • the method of determining the quantization parameter corresponding to the data to be quantized may also include a method of directly determining the quantization parameter corresponding to each data to be quantized.
  • the target data may not have a corresponding quantization parameter, or the target data may have a corresponding quantization parameter but the data to be quantized is not used.
  • the corresponding quantization parameter can be directly set for each data to be quantized.
  • the corresponding quantization parameter can also be calculated according to the data to be quantized. At this time, the quantization parameters corresponding to the data to be quantized may be the same or different.
  • the weight can be divided into multiple weight data to be quantized according to channels, and the weight data to be quantized of different channels can correspond to different quantization parameters.
  • the quantization parameter corresponding to each data to be quantized is different, after each data to be quantized is quantized using the corresponding quantization parameter, the quantization result obtained does not need to affect the calculation of the target data.
  • the method of determining the quantization parameter corresponding to the target data, or the method of determining the quantization parameter corresponding to the data to be quantized may include the method of searching the preset quantization parameter to directly determine the quantization parameter, and the method of searching the corresponding relationship to determine the quantization parameter. Or the way to calculate the quantization parameter based on the data to be quantified.
  • the method of determining the quantization parameter corresponding to the data to be quantized is taken as an example for description:
  • the quantization parameter corresponding to the data to be quantized can be directly set.
  • the set quantization parameters can be stored in the set storage space.
  • the set storage space can be on-chip or off-chip storage space.
  • the set quantization parameter can be stored in the set storage space.
  • each data to be quantized is quantized, it can be quantized after the corresponding quantization parameter is extracted in the set storage space.
  • the quantization parameter corresponding to each type of data to be quantized can be set according to empirical values.
  • the stored quantization parameters corresponding to each type of data to be quantized can also be updated according to requirements.
  • the quantization parameter can be determined by searching the correspondence relationship between the data feature and the quantization parameter according to the data feature of each data to be quantized. For example, when the data distribution of the data to be quantized is sparse and dense, it can correspond to different quantization parameters respectively.
  • the quantization parameter corresponding to the data distribution of the data to be quantized can be determined by searching the correspondence relationship.
  • the quantization parameter corresponding to each layer to be quantized can be calculated by using the set quantization parameter calculation method according to the data to be quantized.
  • the point position in the quantization parameter can be calculated by using a rounding algorithm according to the maximum value of the absolute value of the data to be quantized and the preset data bit width.
  • step S3-13 the quantization result of the target data is obtained according to the quantization data corresponding to each of the data to be quantized, so that the layer to be quantized performs operations according to the quantization result of the target data.
  • the set quantization algorithm can be used to quantize the quantized data according to the quantization parameters to obtain quantized data.
  • a rounding algorithm can be used as the quantization algorithm, and the quantized data can be rounded and quantized according to the data bit width and point position to obtain the quantized data.
  • the rounding algorithm may include rounding up, rounding down, rounding to zero, and rounding to five. The present disclosure does not limit the specific implementation of the quantization algorithm.
  • Each data to be quantized can be quantized using corresponding quantization parameters. Since the quantization parameters corresponding to each data to be quantized are more suitable for the characteristics of the data to be quantized, the quantization accuracy of each type of quantized data in each layer to be quantized is more in line with the calculation requirements of the target data, and also more in line with the layer to be quantized. Computing needs. On the premise of ensuring the accuracy of the calculation result of the layer to be quantized, the calculation efficiency of the layer to be quantized can be improved, and a balance between the calculation efficiency of the layer to be quantized and the accuracy of the calculation result can be achieved. Further, the target data is divided into multiple data to be quantized and quantized separately.
  • the quantization of the second data to be quantized can be performed while performing calculations based on the quantization result obtained by the quantization.
  • the overall improvement of the computing efficiency of the target data also increases the computing efficiency of the layer to be quantified.
  • the quantized data of the data to be quantized can be combined to obtain the quantized result of the target data. It is also possible to perform a set operation on the quantized data of each data to be quantized to obtain the quantized result of the target data. For example, the quantized data of each data to be quantized can be weighted according to the set weight to obtain the quantized result of the target data. This disclosure does not limit this.
  • the data to be quantified can be quantified offline or online.
  • offline quantization can be to use quantization parameters to perform offline processing on the data to be quantized.
  • Online quantization can be the online processing of the data to be quantized using quantization parameters.
  • the neural network runs on an artificial intelligence chip, and the data to be quantified and quantified parameters can be sent to a computing device outside the artificial intelligence chip for offline quantification, or a computing device other than the artificial intelligence chip can be used to calculate the data to be quantified in advance.
  • quantization parameters for offline quantization In the process of the artificial intelligence chip running the neural network, the artificial intelligence chip can use the quantitative parameters to quantify the quantified data online. In the present disclosure, there is no limitation on whether the quantization process of each data to be quantified is online or offline.
  • the method includes: determining multiple data to be quantized in the target data of the layer to be quantized, and each data to be quantized is target data
  • the target data is any kind of data to be quantified in the layer to be quantized.
  • the data to be calculated includes at least one of input neurons, weights, offsets, and gradients; each data to be quantized is based on the corresponding
  • the quantization parameter of quantization is performed to obtain the quantization data corresponding to each data to be quantized; the quantization data of the target data is obtained according to the quantization data corresponding to each data to be quantized, so that the layer to be quantized performs operations based on the quantization data of the target data.
  • the quantization process and calculation process of each data to be quantized can be executed in parallel, which can improve the quantization efficiency and calculation efficiency of the target data, and can also improve the layer to be quantized until the entire neural network is improved. Quantify efficiency and computing efficiency.
  • the layer to be quantized is a convolutional layer
  • the target data is an input neuron.
  • determining multiple data to be quantized in the target data of the layer to be quantized may include:
  • a plurality of data to be quantized corresponding to the convolution kernel is determined according to the dimension and step length of the convolution kernel, and the dimensions of the convolution kernel include height, width, and the number of channels.
  • the dimensions of the input neurons of the convolutional layer may include batch number (batch, B), channel (channel, C), height (height, H), and width (width, W).
  • batch, B batch number
  • channel channel
  • height height
  • width width
  • W width
  • each batch of input neurons can be regarded as three-dimensional data whose dimensions are channel, height and width.
  • Each batch of input neurons can correspond to multiple convolution kernels, and the number of channels of each batch of input neurons is consistent with the number of channels of each convolution kernel corresponding to it.
  • the height, width and step length of the convolution kernel can be
  • the partial data (subset) corresponding to the batch of input neurons and the convolution kernel is determined to be a plurality of to-be-quantized data corresponding to the batch of input neurons and the convolution kernel.
  • Fig. 4-2 shows a schematic diagram of determining data to be quantized according to an input neuron according to a convolution kernel according to an embodiment of the present disclosure.
  • the dimension of the input neuron is 5 ⁇ 5 ⁇ 3 (H ⁇ W ⁇ C)
  • the dimension of a corresponding convolution kernel (not shown in the figure) is 3 ⁇ 3 ⁇ 3 (H ⁇ W ⁇ C).
  • the data to be quantized 1 determined according to the convolution kernel is shown.
  • the color of the data to be quantized 1 is slightly lighter than the color of the input neuron, and the dimension of the data to be quantized 1 is 3.
  • FIG. 4-3 shows a schematic diagram of determining data to be quantized according to an input neuron according to a convolution kernel according to an embodiment of the present disclosure.
  • the data to be quantized 2 determined according to the convolution kernel is shown.
  • the color of the data to be quantized 2 is slightly darker than the color of the input neuron, and the dimension of the data to be quantized 2 is 3. ⁇ 3 ⁇ 3(H ⁇ W ⁇ C).
  • the data to be quantized 2 has moved to the right in the W dimension direction by 1 grid consistent with the step size.
  • the dimensions of the data to be quantized 1 and the data to be quantized 2 are consistent with the dimensions of the convolution kernel.
  • the quantization process is performed in parallel for each data to be quantized. Since the amount of data to be quantified is smaller than the input neuron, the calculation amount for quantifying a piece of data to be quantified is smaller than the calculation amount for the overall quantization of the input neuron. Therefore, the quantization method in this embodiment can improve the input neuron’s performance. Quantization speed, improve quantization efficiency. It is also possible to divide the input neuron according to the dimensions and step size of the convolution kernel, and after each data to be quantized is obtained in turn, the obtained data to be quantized are respectively convolved with the convolution kernel. The quantization process and the convolution operation process of each data to be quantized can be executed in parallel, and the quantization method in this embodiment can improve the quantization efficiency and operation efficiency of the input neuron.
  • the dimension of each data to be quantified determined in the input neuron may not be consistent with the dimension of the convolution kernel.
  • the dimension of each data to be quantized may be smaller than the dimension of the convolution kernel, and at least one dimension of the convolution kernel is a multiple of the corresponding dimension of the data to be quantified.
  • the dimension of each data to be quantized may also be greater than the dimension of the convolution kernel, and at least one dimension of the data to be quantized is a multiple of the corresponding dimension of the convolution kernel.
  • the dimension of each data to be quantized may be smaller than the dimension of the convolution kernel.
  • the dimension of the convolution kernel A is 8 ⁇ 8 ⁇ 3
  • the dimension of the data to be quantized A1 may be 4 ⁇ 8 ⁇ 3
  • the dimension of the data to be quantized A2 It can be 4 ⁇ 8 ⁇ 3, and the subset composed of the to-be-quantized data A1 and the to-be-quantized data A2 is the data that is convolved with the convolution kernel A.
  • the quantization results of the data A1 to be quantized and the data A2 to be quantized can be spliced, and the convolution operation is performed with the convolution kernel A according to the splicing result.
  • the dimension of each data to be quantized may also be greater than the dimension of the convolution kernel.
  • the dimension of the convolution kernel A is 8 ⁇ 8 ⁇ 3
  • the dimension of the data A1 to be quantized may be 16 ⁇ 8 ⁇ 3.
  • a quantization parameter corresponding to the target data may be used for quantization in the process of quantizing the target data.
  • the quantization parameter corresponding to each data to be quantized can be used for quantization.
  • the quantization parameters corresponding to the data to be quantized can be preset or calculated according to the data to be quantized. No matter which method is used to determine the quantization parameter corresponding to the data to be quantized, the quantization of the data to be quantized can be achieved.
  • the parameters are more in line with the quantitative needs of the data to be quantified. For example, when the corresponding quantization parameter is calculated according to the target data, the quantization parameter can be calculated by using the maximum value and the minimum value of each element in the target data.
  • the maximum and minimum values of each element in the data to be quantized can be used to calculate the quantization parameter.
  • the quantization parameter of the data to be quantized can be more suitable than the quantization parameter of the target data.
  • the data characteristics of the quantized data can make the quantization result of the data to be quantized more accurate and the quantization precision higher.
  • a plurality of data to be quantized corresponding to the convolution kernel is determined according to the dimension and step size of the convolution kernel, and the dimensions of the convolution kernel include height, Width, number of channels.
  • the calculation amount for quantizing each data to be quantized is less than the calculation amount for quantizing the target data, which can improve the quantization efficiency of the target data.
  • Parallel execution of the quantization process and calculation process of each data to be quantized can improve the quantization efficiency and calculation efficiency of the target data.
  • Each data to be quantized is quantized according to the corresponding quantization parameter, and the quantization parameter can better meet the quantization requirements of the data to be quantized, so that the quantization result of the data to be quantized is more accurate.
  • determining multiple data to be quantized in the target data of the layer to be quantized includes:
  • a plurality of data to be quantized is determined in the target data of the layer to be quantized, and the dimensions of the target data include batch number, channel, height, and width.
  • multiple data to be quantified can be obtained.
  • the target data can be divided according to a dimension of the target data. For example, one or more batches of data in the target data of the layer to be quantized can be determined as one data to be quantized. Assuming that the target data B1 has 3 batches of data, if one batch of data in the target data is determined as one to-be-quantized data, the target data B can be divided into 3 to-be-quantized data. It is also possible to determine the data of one or more channels in the target data of the layer to be quantized as one piece of data to be quantized.
  • the target data B2 corresponds to 4 channels
  • the target data B2 can be divided into two data to be quantized, and each data to be quantized includes two Data for each channel.
  • the target data can also be divided according to height and width. For example, assuming that the target data is an input neuron with a dimension of 4 ⁇ 8 ⁇ 3, the input neuron can be divided into two based on half the width of the input neuron.
  • the dimension of each data to be quantized is 4 ⁇ 4 ⁇ 3. It is also possible to divide the input neuron into two data to be quantified based on half the height of the input neuron, and the dimension of each data to be quantified is 2 ⁇ 8 ⁇ 3.
  • the target data can also be divided according to multiple dimensions of the target data.
  • the target data can be divided according to the height and width of the target data.
  • the input neuron can be divided into 8 data to be quantified based on half the width and half of the height of the input neuron.
  • the dimension of the data is 2 ⁇ 4 ⁇ 3.
  • determining multiple data to be quantized in the target data of the layer to be quantized may include:
  • multiple data to be quantized are determined in the target data of the layer to be quantized, and the size of each data to be quantized is positively correlated with the real-time processing capability.
  • the real-time processing capabilities of a device running a neural network can include: the speed at which the device quantizes the target data, the speed at which the quantized data is calculated, the amount of data that the device can process when quantizing the target data, and the amount of data that the device can process when calculating the target data.
  • Information related to data processing capabilities For example, the size of the data to be quantized can be determined according to the speed at which the target data is quantized and the speed at which the quantized data is operated, so that the time to quantize the data to be quantized is the same as the speed at which the quantized data is operated. Simultaneous quantization and calculation can improve the calculation efficiency of target data. The stronger the real-time processing capability of the device running the neural network, the larger the size of the data to be quantified.
  • the method may further include: calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width.
  • the quantization parameter may include one or more of point position, scaling factor, and offset.
  • calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width may include:
  • the first-type point position of each data to be quantized is obtained according to the maximum absolute value Z 1 in each data to be quantized and the corresponding data bit width.
  • the maximum absolute value Z 1 is the maximum value obtained by taking the absolute value of the data in the data to be quantized.
  • the quantization parameter may not include the offset.
  • Z 1 is the maximum value of the absolute value of the element in the data to be quantized
  • a 1 is the maximum value that can be represented by the quantized data after the quantized data is quantized with the data bit width n
  • a 1 is A 1 needs to include Z 1
  • Z 1 must be greater than Therefore
  • the processor can calculate the position s 1 of the first type of point according to the maximum absolute value Z 1 and the data bit width n in the data to be quantized. For example, the following formula (3-2) can be used to calculate the first-type point position s 1 corresponding to the data to be quantified:
  • ceil is rounded up
  • Z 1 is the maximum absolute value in the data to be quantized
  • s 1 is the position of the first type point
  • n is the data bit width.
  • calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width may include:
  • the second-type point position s 2 of each data to be quantized is obtained according to the maximum value, the minimum value and the corresponding data bit width in each data to be quantized.
  • the maximum value Z max and the minimum value Z min in the data to be quantized can be obtained first, and then the following formula (3-3) is used to calculate according to the maximum value Z max and the minimum value Z min ,
  • the absolute maximum value is obtained directly based on the maximum value and minimum value of the saved data to be quantized. Consume more resources to determine the absolute value of the quantitative data, saving the time to determine the statistical results.
  • calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width includes:
  • the first-type scaling factor f'of each data to be quantized is obtained.
  • the first type of scaling factor f' may include a first scaling factor f 1 and a second scaling factor f 2 .
  • the first scaling factor f 1 can be calculated in the following manner (3-5):
  • the second scaling factor f 2 can be calculated according to the following formula (3-6):
  • calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width may include:
  • the offset of each data to be quantized is obtained.
  • FIGS. 4-4 show a schematic diagram of a symmetric fixed-point number representation according to an embodiment of the present disclosure.
  • the number field of the data to be quantized is distributed with "0" as the symmetric center.
  • Z 1 is the maximum absolute value of all floating-point numbers in the number domain of the data to be quantized.
  • a 1 is the maximum floating-point number that can be represented by an n-bit fixed-point number, and the floating-point number A 1 is converted to a fixed-point number. Yes (2 n-1 -1).
  • a 1 needs to include Z 1 .
  • FIGs 4-5 show schematic diagrams of a fixed-point number representation introducing an offset according to an embodiment of the present disclosure. As shown in Figure 4-5.
  • the number field of the data to be quantized is not distributed symmetrically with "0" as the center.
  • Z min is the minimum value of all floating-point numbers in the number field of the data to be quantized
  • Z max is the maximum value of all floating-point numbers in the number field of the data to be quantized.
  • a 2 is the maximum value of the translated floating-point number represented by n-bit fixed-point number
  • a 2 is P is the center point between Z min ⁇ Z max , which shifts the number field of the data to be quantized as a whole, so that the number field of the data to be quantized after translation is distributed with "0" as the symmetric center to avoid data "overflow” .
  • the maximum absolute value in the number field of the data to be quantized after translation is Z 2 . It can be seen from Figure 4-5 that the offset is the horizontal distance from point "0" to point "P”, and this distance is called offset o.
  • the offset can be calculated according to the minimum value Z min and maximum value Z max according to the following formula (3-7):
  • o represents the offset
  • Z min represents the minimum value among all the elements of the data to be quantized
  • Z max represents the maximum value among all the elements of the data to be quantized.
  • calculating the corresponding quantization parameter according to each of the data to be quantized and the corresponding data bit width may include:
  • the quantization parameter includes an offset
  • the second type scaling factor f" of each of the data to be quantized is obtained.
  • the second type scaling factor f" may include the first The third scaling factor f 3 and the fourth scaling factor f 4 .
  • a 2 is the maximum value that can be represented by the quantized data after quantizing the shifted data to be quantized with the data bit width n
  • a 2 is The maximum absolute value Z 2 in the number domain of the translated data to be quantized can be calculated according to Z max and the minimum value Z min in the data to be quantized, and then the third scaling factor f is calculated according to the following formula (3-8) 3 :
  • the fourth scaling factor f 4 can be calculated according to the following formula (3-9):
  • the quantization parameters used are different, and the data used for quantization is different.
  • the quantization parameter may include the first-type point position s 1 .
  • the following formula (3-10) can be used to quantify the data to be quantized to obtain the quantized data I x :
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding.
  • the quantized data of the target data can be dequantized according to formula (3-11) to obtain the dequantized data of the target data
  • the quantization parameter may include a first-type point position and a first scaling factor.
  • the following formula (3-12) can be used to quantize the data to be quantized to obtain the quantized data I x :
  • the quantized data of the target data can be dequantized according to formula (3-13) to obtain the dequantized data of the target data
  • the quantization parameter may include a second scaling factor.
  • the following formula (3-14) can be used to quantify the data to be quantized to obtain the quantized data I x :
  • the quantized data of the target data can be dequantized according to formula (3-15) to obtain the dequantized data of the target data
  • the quantization parameter may include an offset.
  • the following formula (3-16) can be used to quantify the data to be quantized to obtain the quantized data I x :
  • the quantized data of the target data can be dequantized according to formula (3-17) to obtain the dequantized data of the target data
  • the quantization parameter may include the position and offset of the second type of point.
  • the following formula (3-18) can be used to quantify the data to be quantized to obtain the quantized data I x :
  • the quantized data of the target data can be dequantized according to formula (3-19) to obtain the dequantized data of the target data
  • the quantization parameter may include the second-type scaling factor f" and the offset o.
  • the following formula (3-20) can be used to quantize the quantized data to obtain the quantized data I x :
  • the quantized data of the target data can be dequantized according to formula (3-21) to obtain the dequantized data of the target data
  • the quantization parameter may include the position of the second-type point, the second-type scaling factor, and the offset.
  • the following formula (3-22) can be used to quantify the data to be quantized to obtain the quantized data I x :

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种数据处理方法、装置、计算机设备和存储介质。其所公开的板卡包括:存储器件(390)、接口装置(391)和控制器件(392)以及包括数据处理装置的人工智能芯片(389);其中,人工智能芯片(389)与存储器件(390)、控制器件(392)以及接口装置(391)分别连接;存储器件(390),用于存储数据;接口装置(391),用于实现人工智能芯片(389)与外部设备之间的数据传输;控制器件(392),用于对人工智能芯片(389)的状态进行监控。该数据处理方法、装置、计算机设备和存储介质,利用对应的量化参数对待量化数据进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率。

Description

数据处理方法、装置、计算机设备和存储介质 技术领域
本公开涉及计算机技术领域,特别是涉及一种神经网络的数据量化方法、装置、计算机设备和存储介质。
背景技术
神经网络(neural network,NN)是一种模仿生物神经网络的结构和功能的数学模型或计算模型。神经网络通过样本数据的训练,不断修正网络权值和阈值使误差函数沿负梯度方向下降,逼近期望输出。它是一种应用较为广泛的识别分类模型,多用于函数逼近、模型识别分类、数据压缩和时间序列预测等。神经网络被应用到图像识别、语音识别、自然语言处理等领域中,然而,随着神经网络复杂度提高,数据的数据量和数据维度都在不断增大,不断增大的数据量等对运算装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。相关技术中,采用固定位宽对神经网络的运算数据进行量化,即将浮点型的运算数据转换为定点型的运算数据,以实现神经网络的运算数据的压缩。但相关技术中针对整个神经网络采用相同的量化方案,但神经网络的不同运算数据之间可能存在较大的差异,往往会导致精度较低,影响数据运算结果。
发明内容
基于此,有必要针对上述技术问题,提供一种神经网络的数据量化处理方法、装置、计算机设备和存储介质。
根据本公开的一方面,提供了一种神经网络的数据量化处理方法,应用于处理器,所述方法包括:
按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;
根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
利用对应的量化参数对所述待量化数据进行量化,
其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
根据本公开的另一方面,提供了一种神经网络的数据量化处理装置,应用于处理器,所述装置包括:
数据统计模块,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;
量化参数确定模块,根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
量化处理模块,利用对应的量化参数对所述待量化数据进行量化,
其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
根据本公开的另一方面,提供了一种人工智能芯片,其特征在于,所述芯片包括上述神经网络的数据量化处理装置。
根据本公开的另一方面,提供了一种电子设备,所述电子设备包括上述人工智能芯片。
根据本公开的另一方面,提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及 上述人工智能芯片;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
所述存储器件,用于存储数据;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
所述控制器件,用于对所述人工智能芯片的状态进行监控。
根据本公开的另一方面,提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述神经网络的数据量化处理方法。
本公开实施例所提供的神经网络的数据量化处理方法、装置、计算机设备和存储介质,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;利用对应的量化参数对待量化数据进行量化。本公开实施例所提供的神经网络的数据量化处理方法、装置、计算机设备和存储介质,利用对应的量化参数对待量化数据进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求。
有鉴于此,本公开提出了一种循环神经网络的量化参数调整方法、装置及相关产品,能够提高神经网络的量化精度,保证运算结果的正确性和可靠性。
本公开提供了一种循环神经网络的量化参数调整方法,所述方法包括:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种循环神经网络的量化参数调整装置,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现上述任一项所述的方法的步骤。具体地,处理器执行上述计算机程序时,实现如下操作:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时,实现上述任一项所述的循环神经网络的量化参数调整方法的步骤。具体地,上述计算机程序被执行时,实现如下操作:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种循环神经网络的量化参数调整装置,所述装置包括:
获取模块,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至 少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开的循环神经网络的量化参数调整方法、装置及相关产品,通过获取待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定出第一目标迭代间隔,从而可以根据该第一目标迭代间隔调整循环神经网络的量化参数,这样可以根据确定待量化数据的数据分布特性确定循环神经网络在不同运算阶段的量化参数。相较于现有技术中,针对同一循环神经网络的各种运算数据均采用相同的量化参数的方式,本公开的方法及装置能够提高循环神经网络量化过程中的精度,进而保证运算结果的准确性和可靠性。进一步地,通过确定目标迭代间隔还可以提高量化效率。
基于此,有必要针对上述技术问题,提供一种神经网络量化方法、装置、计算机设备和存储介质。
根据本公开的一方面,提供了一种神经网络量化方法,对于所述神经网络中的任意待量化层,所述方法包括:
在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;
将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;
根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
根据本公开的另一方面,提供了一种神经网络量化装置,对于所述神经网络中的任意待量化层,所述装置包括:
数据确定模块,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;
数据量化模块,将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;
数据运算模块,根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
根据本公开的另一方面,提供了一种人工智能芯片,其特征在于,所述芯片包括上述神经网络量化装置。
根据本公开的另一方面,提供了一种电子设备,所述电子设备包括上述人工智能芯片。
根据本公开的另一方面,提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及上述人工智能芯片;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
所述存储器件,用于存储数据;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
所述控制器件,用于对所述人工智能芯片的状态进行监控。
根据本公开的另一方面,提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述神经网络量化方法。
本公开实施例所提供的神经网络量化方法、装置、计算机设备和存储介质,该方法包括:在所述 待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。本公开实施例所提供的神经网络量化方法、装置、计算机设备和存储介质,利用对应的量化参数对目标数据中的多个待量化数据分别进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求。
通过权要中的技术特征进行推导,能够达到对应背景技术中的技术问题的有益效果。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。
图1示出根据本公开实施例的处理器的示意图。
图2-1示出根据本公开实施例的神经网络的数据量化处理方法的流程图。
图2-2示出根据本公开实施例的对称的定点数表示的示意图。
图2-3示出根据本公开实施例的引入偏移量的定点数表示的示意图。
图2-4a、图2-4b为训练过程中神经网络的权值数据变动幅度曲线图。
图2-5示出根据本公开一实施例的神经网络的数据量化处理装置的框图。
图3-1为该量化参数调整装置100’的结构框图;
图3-2示出本公开一实施例的待量化数据和量化数据的对应关系示意图;
图3-3示出本公开一实施例的待量化数据的转换示意图;
图3-4示出本公开一实施例的循环神经网络的量化参数调整方法的流程图;
图3-5a示出本公开一实施例的待量化数据在运算过程中的变动趋势图;
图3-5b示出本公开一实施例的循环神经网络的展开示意图;
图3-5c示出本公开一实施例的循环神经网络的周期示意图;
图3-6示出本公开一实施例的循环神经网络的参数调整方法的流程图;
图3-7示出本公开一实施例中点位置的变动幅度的确定方法的流程图;
图3-8示出本公开一实施例中第二均值的确定方法的流程图;
图3-9示出本公开一实施例中数据位宽调整方法的流程图;
图3-10示出本公开另一实施例中数据位宽调整方法的流程图;
图3-11示出本公开又一实施例中数据位宽调整方法的流程图;
图3-12示出本公开再一实施例中数据位宽调整方法的流程图;
图3-13示出本公开另一实施例中第二均值的确定方法的流程图;
图3-14示出本公开另一实施例的量化参数调整方法的流程图;
图3-15示出本公开一实施例的量化参数调整方法中调整量化参数的流程图;
图3-16示出本公开另一实施例的参数调整方法中第一目标迭代间隔的确定方法的流程图;
图3-17示出本公开再一实施例的量化参数调整方法的流程图;
图3-18示出本公开一实施例的量化参数调整装置的结构框图;
图4-1示出根据本公开实施例的神经网络量化方法的流程图。
图4-2示出根据本公开实施例的将输入神经元按照卷积核确定待量化数据的示意图。
图4-3示出根据本公开实施例的将输入神经元按照卷积核确定待量化数据的示意图。
图4-4示出根据本公开实施例的对称的定点数表示的示意图。
图4-5示出根据本公开实施例的引入偏移量的定点数表示的示意图。
图4-6示出根据本公开实施例的神经网络量化方法的流程图。
图4-7示出根据本公开一实施例的神经网络量化装置的框图。
图5示出根据本公开实施例的板卡的结构框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
图1示出根据本公开实施例的处理器的示意图。如图1所示,该处理器100可以执行下述方法,处理器100包括多个处理单元101以及存储单元102,多个处理单元101用于执行指令序列,存储单元102用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理器100中的多个处理单元101既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。
随着神经网络运算复杂度的提高,数据的数据量和数据维度也在不断增大,而传统的神经网络算法通常采用浮点数据格式来执行神经网络运算,这就使得不断增大的数据量等对运算装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。为解决上述问题,相关技术中,对神经网络运算过程涉及的全部数据均由浮点数转化定点数,但由于不同的数据之间具有差异性,或者,同一数据在不同阶段具有差异性,仅“由浮点数转化定点数”时,往往会导致精度不够,从而会影响运算结果。本公开实施例所提供的神经网络的数据量化处理方法、装置、计算机设备和存储介质,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果; 根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数。利用对应的量化参数对待量化数据进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求,使神经网络模型可以应用于算力、体积、功耗相对受限的手机等终端。
根据本公开实施例的神经网络的数据量化处理方法可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。
图2-1示出根据本公开实施例的神经网络的数据量化处理方法的流程图。如图2-1所示,该方法可以包括步骤S11至步骤S13。该方法可以应用于图1所示的处理器100。其中,处理单元101用于执行步骤S11至步骤S13。存储单元102用于存储待量化数据、统计结果、量化参数、数据位宽等与步骤S11至步骤S13的处理过程相关的数据。
在步骤S11中,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果。待量化数据包括神经网络的神经元、权值、梯度中的至少一种数据。
在步骤S12中,根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数。量化参数包括点位置参数、缩放系数和偏移量。
在步骤S13中,利用对应的量化参数对待量化数据进行量化。
在本实施例中,神经网络运算过程中对应层可以是卷积层(Convolutional layer)、全连接层(Fully connected layer)、池化层(pooling layer)等进行神经网络运算所涉及到的对数据进行运算或处理的层,本公开对此不作限制。
在本实施例中,待量化数据是以高精度数据格式表示的数据,量化后数据是以低精度数据格式表示的,待量化数据的数据格式的精度高于量化后数据的数据格式的精度。
在本实施例中,对于进行不同运算的对应层,可以根据对应层中的通道数采用对应的量化方式进行量化,量化方式包括以下方式一和方式二。
在一种可能的实现方式中,在方式一中,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
在该实现方式中,在对应层无通道(如全连接层)或者对应层为单通道时,可以对该对应层中每种待量化数据进行统计,得到每种待量化数据针对该对应层的统计结果,进而根据统计结果和数据位宽确定出对应层的通道为单通道、或者对应层无通道时,每种待量化数据的量化参数。
在一种可能的实现方式中,在方式二中,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同,对应层的所有通道中的神经元对应的量化参数和数据位宽相同,对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
在该实现方式中,在对应层的通道为多通道时,可以对该对应层中的神经元进行统计,分别确定神经元针对该对应层的统计结果,进而根据神经元针对该对应层的统计结果和数据位宽,确定对应层的所有通道中的神经元对应的量化参数。在对应层的通道为多通道时,可以对该对应层中的梯度进行统计,分别确定梯度针对该对应层的统计结果,进而根据梯度针对该对应层的统计结果和数据位宽,确定对应层的所有通道中的梯度对应的量化参数。
在该实现方式中,在对应层的通道为多通道时,可以对该对应层每个通道中权值进行统计,得到权值在每个通道中的第一统计结果,并根据第一统计结果和数据位宽确定对应层的每一个通道中的权值的缩放系数、偏移量。还可以对该对应层所有通道中权值进行统计,得到权值在对应层中的第二统计结果,并根据第二统计结果和数据位宽确定对应层所有通道中的权值的点位置参数。
在一种可能的实现方式中,神经网络运算过程可以包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
其中,神经网络的训练(Training)是指对神经网络(该神经网络的权值可以是随机数)进行多次迭代运算(iteration),使得神经网络的权值能够满足预设条件的过程。神经网络的训练包括正向处理和反向传播梯度。在正向处理的过程中,根据输入数据进行神经网络运算,得到运算结果。在反向传播梯度过程中,根据正向处理的正向输出结果以及预测的输出结果确定误差值,并根据该误差值确定权值梯度和/或输入数据梯度的过程。其中,误差值的导数即为梯度。神经网络的训练过程如下:处理器可以采用权值为随机数的神经网络对输入数据进行正向处理,获得正向处理结果。之后处理器根据该正向处理结果与预设参考值确定误差值,根据该误差值确定权值梯度和/或输入数据梯度。最后,处理器可以根据权值梯度更新神经网络的梯度,获得新的权值,完成一次迭代运算。处理器循环执行多次迭代运算,直至神经网络的正向处理结果满足预设条件。
神经网络微调是指对神经网络(该神经网络的权值已经处于收敛状态而非随机数)进行多次迭代运算,以使得神经网络的精度能够满足预设需求的过程。该微调过程与上述训练过程基本一致,可以认为是对处于收敛状态的神经网络进行重训练的过程。
神经网络推理(Inference)是指采用权值满足预设条件的神经网络进行正向处理,从而实现识别或分类等功能的过程,如采用神经网络进行图像识别等等。
在神经网络进行训练或微调过程中,神经网络每经过一次信号的正向处理以及对应一次误差的反向传播过程,神经网络中的权值利用梯度进行一次更新,此时称为一次迭代(iteration)。为了获得精度符合预期的神经网络,在训练过程中需要很庞大的样本数据集。在这种情况下,一次性将样本数据集输入计算机是不可能的。因此,为了解决这个问题,需要把样本数据集分成多个块,每块传递给计算机,每块数据集正向处理后,对应更新一次神经网络的权值。当一个完整的样本数据集通过了神经网络一次正向处理并且对应返回了一次权值更新,这个过程称为一个周期(epoch)。实际中,在神经网络中传递一次完整的数据集是不够的,需要将完整的数据集在同一神经网络中传递多次,即需要多个周期,最终获得精度符合预期的神经网络。
举例来说,以包括5个卷积层、3个全连接层的神经网络为例。
在正向处理中:
在卷积层和全连接层中,对每一层中的神经元分别进行量化,每一层所有的神经元具有相同的点位置参数、缩放系数、偏移量。每个神经元具有对应的数据位宽,可以是预设值、也可以是根据数据位宽对应的量化误差调整后的值。
对于全连接层中的权值,可以采用“方式一”对各对应层的权值进行量化,每一层中所有的权值具 有相同的点位置参数、缩放系数、偏移量。
对于卷积层中的权值,可以利用上述“方式二”进行量化,也即,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数。也可以利用“方式一”对各对应层的权值进行量化,每一层中所有的权值具有相同的点位置参数、缩放系数、偏移量。每个权值具有对应的数据位宽,可以是预设值、也可以是根据数据位宽对应的量化误差调整后的值。卷积层中的权值采用“方式一”进行量化的准确性相对较低,运算的速度相对高,而与采用“方式一”进行量化相比,采用“方式二”进行量化的准确性相对高,速度相对低,本领域技术人员可以根据实际需要对卷积层中权值的量化方式进行设置,本公开对此不作限制。
在反向处理中:
神经元和权值的量化过程与正向处理相同,不再赘述。
在卷积层和全连接层中,对每一层中的梯度分别进行量化,每一层所有的梯度具有相同的点位置参数、缩放系数、偏移量。每个梯度具有对应的数据位宽,可以是预设值、也可以是根据数据位宽对应的量化误差调整后的值。
在一种可能的实现方式中,统计结果可以包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一。其中,绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
在该实现方式中,在对应层无通道(如全连接层)或者对应层为单通道时,统计结果可以是每种待量化数据在该对应层的绝对值最大值、或者是最大值和最小值的距离的二分之一。在对应层的通道为多通道时,统计结果可以是每种待量化数据在该对应层的绝对值最大值或者最大值和最小值的距离的二分之一,也还可以包括每种待量化数据在该对应层的不同通道中的绝对值最大值、或者是最大值和最小值的距离的二分之一。
在该实现方式中,每种待量化数据在该对应层或该对应层的某个通道中的绝对值最大值,可以通过每种待量化数据中的最大值和最小值方式确认。由于量化时,常规情况下会将对应层的待量化数据对应的最大值和最小值保存下来,直接基于保存的待量化数据对应的最大值和最小值来获取绝对值最大值,无需消耗更多的资源去对待量化数据求绝对值,节省确定统计结果的时间。
在一种可能的实现方式中,缩放系数可以是根据点位置参数、统计结果、数据位宽确定的。
在一种可能的实现方式中,偏移量是根据每种待量化数据的统计结果确定的。
在一种可能的实现方式中,数据位宽可以为预设值。例如,数据位宽的预设值可以是8bit。
在一种可能的实现方式中,在步骤S13中可以基于下述公式(1-1)、公式(1-2)对待量化数据进行量化,得到量化数据I x
F x≈I x×2 s×f+O   (1-1)
Figure PCTCN2020095679-appb-000001
其中,F x为量化前x的浮点值,I x为量化后x的n位二进制表示值,I x的位数(也即数据位宽)为n,则I x表示的范围为[-2 n-1,2 n-1-1]。S为点位置参数,取整数,其与定点数点的位置相关。f为缩放系数,取有理数且f∈(0.5,1]。O为偏移量,取有理数。round表示四舍五入的取整运算。 round还可以替换为其他取整运算,例如,采用向上取整、向下取整、向零取整等取整运算,本公开对此不作限制。
本实施例中,对于神经网络运算过程中的神经网络训练、神经网络推理、神经网络微调,可以进行8bit量化,即n为8,I x的取值范围为[-128,127]。
此时,用n位定点数可以表示浮点数的最大值A为2 s(2 n-1-1),那么n位定点数可以表示待量化数据的数域中最大值为2 s(2 n-1-1),n位定点数可以表示待量化数据的数域中最小值为-2 s(2 n-1-1)。由式(1-1)可知,采用第一种情况对应的量化参数对待量化数据进行量化时,量化间隔为2 s×f,量化间隔记为C。
设Z为待量化数据的数域中所有浮点数的绝对值最大值(也即统计结果),则A需要包含Z,且Z要大于
Figure PCTCN2020095679-appb-000002
因此有如下公式(1-3)约束:
2 s(2 n-1-1)≥Z>2 s-1(2 n-1-1)   (1-3)
此时,根据公式(1-3)可得:
Figure PCTCN2020095679-appb-000003
Figure PCTCN2020095679-appb-000004
Figure PCTCN2020095679-appb-000005
时,根据公式(1-3),Z可以无损精确表示。
当f=1时,可以计算出
Figure PCTCN2020095679-appb-000006
其中,n位定点数可以表示待量化数据的数域中最大值为(2 n-1-1)×2 s×f+O,n位定点数可以表示待量化数据的数域中最小值为-(2 n-1-1)×2 s×f+O。
可以根据公式(1-4)对数据x量化后的n位二进制表示值I x进行反量化,获得反量化数据
Figure PCTCN2020095679-appb-000007
其中,反量化数据
Figure PCTCN2020095679-appb-000008
的数据格式与对应的量化前的数据F x的数据格式相同,均为浮点值。
Figure PCTCN2020095679-appb-000009
在本实施例中,在对应层的通道为单通道、或者对应层无通道时,
Figure PCTCN2020095679-appb-000010
在本实施例中,在对应层的通道为多通道时,
Figure PCTCN2020095679-appb-000011
f (c)为对应层第c个通道的缩放系数,Z (c)为对应层第c个通道的统计结果。
在本实施例中,图2-2示出根据本公开实施例的对称的定点数表示的示意图。如图2-2所示的待量化数据的数域是以“0”为对称中心分布。Z为待量化数据的数域中所有浮点数的绝对值最大值,在图2-2中,A为n位定点数可以表示的浮点数的最大值,浮点数A转换为定点数是2 n-1-1。为了避免溢出,A需要包含Z。在实际运算中,神经网络运算过程中的浮点数据趋向于某个确定区间的正态分布,但是并不一定满足以“0”为对称中心的分布,这时用定点数表示时,容易出现溢出情况。为了改善这一情况,量化参数中引入偏移量。图2-3示出根据本公开实施例的引入偏移量的定点数表示的示意图。如图2-3所示。待量化数据的数域不是以“0”为对称中心分布,Z min是待量化数据的数域中所有浮点数的最小值,Z max是待量化数据的数域中所有浮点数的最大值。P为Z min~Z max之间的中心点,将待量化数据的数域整体偏移,使得平移后的待量化数据的数域以“0”为对称中心分布,平移后的待量化数据的数域中的绝对值最大值为Z。由图2-3可知,偏移量为“0”点到“P”点之间的水平距离,该距离称为偏移量O。其中,
Figure PCTCN2020095679-appb-000012
在本实施例中,由公式(1-1)~公式(1-4)可知,点位置参数和缩放系数均与数据位宽有关。不同的数据位宽,导致点位置参数和缩放系数不同,从而影响量化精度。量化就是将以往用32bit或者64bit表达的高精度数转换成占用较少内存空间的定点数的过程,高精度数转换为定点数的过程就会在精度上引起一定的损失。在训练或微调过程中,在一定的迭代(iterations)的次数范围内,使用相同的数据位宽量化对神经网络运算的总体精度影响不大。超过一定的迭代次数,再使用同一数据位宽量化就无法满足训练或微调对精度的要求。这就需要随着训练或微调的过程对数据位宽n进行调整。简单地,可以人为将数据位宽n设置为预设值。在不同的迭代次数范围内,调用提前设置的对应的数据位宽n。
在一种可能的实现方式中,该方法还可以包括:根据数据位宽对应的量化误差,对数据位宽进行调整,以利用调整后的数据位宽确定量化参数。其中,量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
在一种可能的实现方式中,根据数据位宽对应的量化误差,对数据位宽进行调整,可以包括:对量化误差与阈值进行比较,根据比较结果调整数据位宽。其中,阈值可以包括第一阈值和第二阈值中的至少一个。第一阈值大于第二阈值。
在一种可能的实现方式中,对量化误差与阈值进行比较,根据比较结果调整数据位,可以包括以下任一项:
在量化误差大于或等于第一阈值时,增加数据位宽;
在量化误差小于或等于第二阈值时,减少数据位宽;
在量化误差处于第一阈值和第二阈值之间时,数据位宽保持不变。
在该实现方式中,第一阈值和第二阈值可以为经验值,也可以为可变的超参数。常规的超参数的 优化方法均适于第一阈值和第二阈值,这里不再赘述超参数的优化方案。
需要强调的是,可以将数据位宽按照固定的位数步长进行调整,也可以根据量化误差与误差阈值之间的差值的不同,按照可变的调整步长调整数据位宽,最终根据神经网络运算过程的实际需要,将数据位宽调整的更长或更短。比如:当前卷积层的数据位宽n为16,根据量化误差将数据位宽n调整为12。也就是说,在实际应用中,数据位宽n取值为12而不必取值为16即可满足神经网络运算过程中对精度的需求,这样在精度允许范围内可以大大提到定点运算速度,从而提升了人工智能处理器芯片的资源利用率。
在一种可能的实现方式中,该方法还可以包括:对量化后的数据进行反量化,获得反量化数据,其中,反量化数据的数据格式与对应的量化前的数据的数据格式相同;根据量化后的数据以及对应的反量化数据确定量化误差。
在一种可能的实现方式中,量化前的数据可以是待量化数据。
在一种可能的实现方式中,处理器可以根据待量化数据及其对应的反量化数据计算获得量化误差。设待量化数据为Z=[z 1,z 2…,z m],该待量化数据对应的反量化数据为Z (n)=[z 1 (n),z 2 (n)…,z m (n)]。处理器可以根据该待量化数据Z及其对应的翻反量化数据Z (n)确定误差项,并根据该误差项确定量化误差。
在一种可能的实现方式中,处理器可以分别计算待量化数据Z与对应的反量化数据Z (n)的差值,获得m个差值,并将该m个差值的和作为误差项。之后,处理器可以根据该误差项确定量化误差。具体的量化误差可以按照如下公式确定:
Figure PCTCN2020095679-appb-000013
其中,i为待量化数据集合中第i个待量化数据的下标。i为大于或等于1、且小于或等于m的整数。
应当理解的是,上述量化误差的确定方式仅是本公开的一个示例,本领域技术人员可以根据实际需要对量化误差的确定方式进行设置,本公开对此不作限制。
对于数据位宽来说,图2-4a、图2-4b为训练过程中神经网络的权值数据变动幅度曲线图。在图2-4a和图2-4b中,横坐标表示是迭代数,纵坐标表示是权值取对数后的最大值。图2-4a所示的权值数据变动幅度曲线展示神经网络的任一卷积层同一周期(epoch)内在不同迭代对应的权值数据变动情况。在图2-4b中,conv0层对应权值数据变动幅度曲线A,conv1层对应权值数据变动幅度曲线B,conv2层对应权值数据变动幅度曲线C,conv3层对应权值数据变动幅度曲线D,conv4层对应权值数据变动幅度曲线e。由图2-4a和图2-4b可知,同一个周期(epoch)内,在训练初期,每次迭代权值变化幅度比较大。在训练中后期,每次迭代权值的变化幅度不会太大。此种情况下,在训练中后期,因为每次迭代前后权值数据变化幅度不大,使得每代的对应层的权值数据之间在一定的迭代间隔内具有相似性,在神经网络训练过程中每层涉及的数据量化时可以采用上一迭代时对应层量化时使用的数据位宽。但是,在训练初期,由于每次迭代前后权值数据的变化幅度比较大,为了满足量化所需的浮点运算的精度,在训练初期的每一次迭代,利用上一代对应层量化时采用的数据位宽对当前代的对应层的权值数据进行量化,或者基于当前层预设的数据位宽n对当前层的权值数据进行量化,获得量化后的定点数。根据量化后的权值数据和对应的量化前的权值数据,确定量化误差,根据量化误差与阈值的比较结果,对上一代对应层量化时采用的数据位宽或者当前层预设的数据位宽进行调整,将调整后的数据位宽应用于当前代的对应层的权值数据的量化。进一步地,在训练或微调过程中,神经网络的每层之间的权值数据相互独立,不具备相似性。因权值数据不具备相似性使得每层之间的神经元数据 也相互独立,不具备相似性。因此,在神经网络训练或微调过程中,神经网络的每一迭代内的每层的数据位宽应用于对应层。上述以权值数据为例,在神经网络训练或微调过程中,神经元数据和梯度数据分别对应的数据位宽亦如此,此处不再赘述。
在一种可能的实现方式中,量化前的数据是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据。其中,目标迭代间隔可以包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的数据位宽。
在一种可能的实现方式中,目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的。或者目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的。其中,预判时间点是用于判断是否需要对数据位宽进行调整的时间点,预判时间点对应权值更新迭代完成时的时间点。
在一种可能的实现方式中,目标迭代间隔的确定步骤可以包括:
在预判时间点,确定权值迭代过程中待量化数据对应点位置参数的变化趋势值;
根据点位置参数的变化趋势值确定对应目标迭代间隔。
在该实现方式中,按照式(1-6),点位置参数的变化趋势值根据当前预判时间点对应的权值迭代过程中的点位置参数的滑动平均值、上一预判时间点对应的权值迭代过程中的点位置参数的滑动平均值确定,或者根据当前预判时间点对应的权值迭代过程中的点位置参数、上一预判时间点对应的权值迭代过程中的点位置参数的滑动平均值确定。公式(1-6)的表达式为:
diff update1=|M (t)-M (t-1)|=α|s (t)-M (t-1)|    (1-6)
式1-6中,M为点位置参数s随着训练迭代增加的滑动平均值。其中,M (t)为第t个预判时间点对应的点位置参数s随着训练迭代增加的滑动平均值,根据公式(1-7)获得M (t)。s (t)为第t个预判时间点对应的点位置参数s。M (t-1)为第t-1个预判时间点对应的点位置参数s的滑动平均值,α为超参数。diff update1衡量点位置参数s变化趋势,由于点位置参数s的变化也变相体现在当前待量化数据中数据最大值Z max的变化情况。diff update1越大,说明数值范围变化剧烈,需要间隔更短的更新频率,即目标迭代间隔更小。
M (t)←α×s (t-1)+(1-α)×M (t-1)    (1-7)
在该实现方式中,根据式(1-8)确定目标迭代间隔。对于目标迭代间隔来说,同一目标迭代间隔内量化过程中采用相同的数据位宽,不同目标迭代间隔内量化过程中采用的数据位宽可以相同,也可以不同。
Figure PCTCN2020095679-appb-000014
式(1-8)中,I为目标迭代间隔。β、γ为超参数。diff update1为点位置参数的变化趋势值。
在该实现方式中,预判时间点包括第一预判时间点,根据目标迭代间隔确定第一预判时间点。具 体地,在训练或微调过程中的第t个预判时间点,利用上一代对应层量化时采用的数据位宽对当前代的对应层的权值数据进行量化,获得量化后的定点数,根据量化前的权值数据和对应的量化前的权值数据,确定量化误差。将量化误差分别与第一阈值和第二阈值进行比较,利用比较结果确定是否对上一代对应层量化时采用的数据位宽进行调整。假如:第t个第一预判时间点对应第100代,第99代使用的数据位宽为n 1。在第100代,根据数据位宽n 1确认量化误差,将量化误差与第一阈值、第二阈值进行比较,获得比较结果。如果根据比较结果确认数据位宽n 1无需改变,利用式(1-8)确认目标迭代间隔为8代,当第100代作为当前目标迭代间隔内的起始迭代,那么第100代~第107代作为当前目标迭代间隔,当第100代作为上一目标迭代间隔的最末迭代,那么第101代~第108代作为当前目标迭代间隔。在当前目标迭代间隔内量化时每代仍然延用上一个目标迭代间隔所使用的数据位宽n 1。这种情况,不同的目标迭代间隔之间量化时所使用的数据位宽可以相同。如果以第100代~第107代作为当前的目标迭代间隔,那么下一个目标迭代间隔内的第108代作为第t+1个第一预判时间点,如果第101代~第108代作为当前的目标迭代间隔,那么当前的目标迭代间隔内的第108代作为第t+1个第一预判时间点。在第t+1个第一预判时间点,根据数据位宽n 1确认量化误差,将量化误差与第一阈值、第二阈值进行比较,获得比较结果。根据比较结果确定数据位宽n 1需要更改为n 2,并利用式(1-8)确认目标迭代间隔为55代。那么第108代~第163代或者第109代~第163代作为目标迭代间隔,在该目标迭代间隔内量化时每代使用数据位宽n 2。这种情况下,不同的目标迭代间隔之间量化时所使用的数据位宽可以不同。
在该实现方式中,不管第一预判时间点是目标迭代间隔内的起始迭代还是最末迭代,均适于式(1-6)来获得点位置参数的变化趋势值。如果当前时刻的第一预判时间点为当前目标迭代间隔的起始迭代,那么在式(1-6)中,M (t)为当前目标迭代间隔的起始迭代对应时间点所对应的点位置参数s随着训练迭代增加的滑动平均值,s (t)为当前目标迭代间隔的起始迭代对应时间点所对应的点位置参数s,M (t-1)为上一目标迭代间隔的起始迭代对应时间点所对应的点位置参数s随着训练迭代增加的滑动平均值。如果当前时刻的第一预判时间点为当前目标迭代间隔的最末迭代,那么在式(1-6)中,M (t)为当前目标迭代间隔的最末迭代对应时间点所对应的点位置参数s随着训练迭代增加的滑动平均值,s (t)为当前目标迭代间隔的最末迭代对应时间点所对应的点位置参数s,M (t-1)为上一目标迭代间隔的最末迭代对应时间点所对应的点位置参数s随着训练迭代增加的滑动平均值。
在该实现方式中,在包括第一预判时间点的基础上,预判时间点还可以包括第二预判时间点。第二预判时间点是根据数据变动幅度曲线确定的。基于大数据在神经网络训练过程中数据变动幅度情况,获得如图2-4a所示的数据变动幅度曲线。
以权值数据为例,由图2-4a所示的数据变动幅度曲线可知,从训练开始到第T代的迭代间隔周期内,每次权值更新时,数据变动幅度非常大。在当前预判时间点,量化时,当前代先利用上一代的数据位宽n 1进行量化,获得的量化结果与对应的量化前的数据确定对应的量化误差,量化误差分别与第一阈值、第二阈值进行比较,根据比较结果对数据位宽n 1进行调整,获得数据位宽n 2。利用数据位宽n 2对当前代涉及的待量化权值数据进行量化。然后根据式(1-8)确定目标迭代间隔,从而确定第一预判时间点,在第一预判时间点再判断是否调整数据位宽以及如何调整,并根据公式(1-8) 确定下一目标迭代间隔来获得下一个第一预判时间点。由于训练开始到第T代的迭代间隔周期内,每一次迭代前后权值数据变化幅度非常大,使得每代的对应层的权值数据之间不具有相似性,为了满足精度问题,量化时当前代的每层的数据不能延用上一代的对应层的对应量化参数,在前T代可以代代调整数据位宽,此时,量化时前T代中每代使用的数据位宽均不同,目标迭代间隔为1代。为了人工智能处理器芯片的资源达到最优化利用,前T代的目标迭代间隔可以根据图2-4a所示的数据变动幅度曲线图所揭示的规律提前预设好,即:根据数据变动幅度曲线前T代的目标迭代间隔直接预设,无需经过公式(1-8)确认前T代的每代对应的权值更新迭代完成时的时间点作为第二预判时间点。从而使得人工智能处理器芯片的资源更为合理的利用。图2-4a所示的数据变动幅度曲线从第T代开始变动幅度不大,在训练的中后期不用代代都重新确认量化参数,在第T代或者第T+1代,利用当前代对应量化前的数据以及量化后的数据确定量化误差,根据量化误差确定对数据位宽是否需要调整以及如何调整,还要根据公式(1-8)确定目标迭代间隔。如果确认的目标迭代间隔为55代,这就要求从第T代或第T+1之后隔55代对应的时间点作为第一预判时间点再判断是否调整数据位宽以及如何调整,并根据公式(1-8)确定下一目标迭代间隔,从而确定下一个第一预判时间点,直至同一周期(epoch)内所有代运算完成。在此基础上,在每个周期(epoch)之后,再对数据位宽或量化参数做适应性调整,最终使用量化后的数据获得精度符合预期的神经网络。
在该实现方式中,假如:根据图2-4a所示的权值数据变动幅度曲线图确定T取值为130(这个数值与图2-4a不对应,为方便描述,仅仅是假设T取值为130,不限于在假设值。),那么训练过程中的第130代作为第二预判时间点,当前的第一预判时间点为训练过程中的第100代,在第100代,经公式(1-8)确定目标迭代间隔为35代。在该目标迭代间隔内,训练至第130代,到达第二预判时间点,此时就要在第130代对应的时间点确定对数据位宽是否需要调整以及如何调整,还要根据公式(1-8)确定目标迭代间隔。假如该情况下确定的目标迭代间隔为42代。就要从第130代起至第172代作为目标迭代间隔,目标迭代间隔为35代时确定的第一预判时间点对应的第135代处于目标迭代间隔为42代内,在第135代,可以再根据公式(1-8)判断是否需要调整数据位宽以及如何调整。也可以不在第135代做评估预判,直接到第172代再执行是否需要调整数据位宽的评估以及如何调整。总之,是否在第135代进行评估和预判均适于本公开所提供的技术方案。
综上,根据数据变动幅度曲线提前预设第二预判时间点,在训练或微调的初期,无需花费人工智能处理器芯片的资源来确定目的迭代间隔,在预设好的第二预判时间点上直接根据量化误差来调整数据位宽,并利用调整好的数据位宽来量化当前代涉及的待量化数据。在训练或微调的中后期,根据公式(1-8)获得目标迭代间隔,从而确定对应的第一预判时间点,在每个第一预判时间点上确定是否调整数据位宽以及如何调整。这样在能够满足神经网络运算所需的浮点运算的精度的同时合理利用人工智能处理器芯片的资源,大大提高了量化时的效率。
在一种可能的实现方式中,为了获得更准确的数据位宽的目标迭代间隔,不仅仅根据点位置参数的变化趋势值,可以同时考虑点位置参数的变化趋势值和数据位宽的变化趋势值。目标迭代间隔的确定步骤可以包括:
在预判时间点,确定权值迭代过程中待量化数据对应点位置参数的变化趋势值、数据位宽的变化趋势值;其中,预判时间点是用于判断是否需要对数据位宽进行调整的时间点,预判时间点对应权值更新迭代完成时的时间点;
根据点位置参数的变化趋势值和数据位宽的变化趋势值确定对应目标迭代间隔。
在该实现方式中,可以根据式(1-9)来利用对应量化误差确定数据位宽的变化趋势值。
Figure PCTCN2020095679-appb-000015
式(1-9)中,δ为超参数,diff bit为量化误差;diff update2为数据位宽的变化趋势值。diff update2衡量量化时采用的数据位宽n的变化趋势,diff update2越大越有可能需要更新定点的位宽,需要间隔更短的更新频率。
在该实现方式中,点位置参数的变化趋势值仍然可根据式(1-6)获得,对于式(1-6)中的M (t)根据公式(1-7)获得。diff update1衡量点位置参数s变化趋势,由于点位置参数s的变化也变相体现在当前待量化数据中数据最大值Z max的变化情况。diff update1越大,说明数值范围变化剧烈,需要间隔更短的更新频率,即目标迭代间隔更小。
在该实现方式中,根据式(1-10)确定目标迭代间隔。对于目标迭代间隔来说,同一目标迭代间隔内量化过程中采用相同的数据位宽,不同目标迭代间隔内量化过程中采用的数据位宽可以相同,也可以不同。
Figure PCTCN2020095679-appb-000016
式(1-10)中,I为目标迭代间隔。β、γ为超参数。diff update1为点位置参数的变化趋势值。diff update2为数据位宽的变化趋势值。
在该实现方式中,,diff update1是用来衡量点位置参数s的变化情况,但是由数据位宽n的变化而导致的点位置参数s的变化是要忽略掉的。因为这已经在diff update2中体现过了数据位宽n的变化。如果在diff update1中不做这个忽略的操作,那么根据式(1-10)确定的目标迭代间隔I是不准确的,造成第一预判时间点过多,在训练或微调过程中,易频繁的做数据位宽n是否更新以及如何更新的操作,从而造成人工智能处理器芯片的资源没有合理利用。
在该实现方式中,diff update1根据M (t)确定。假设第t-1个预判时间点对应的数据位宽为n 1,对应的点位置参数为s 1,点位置参数随着训练迭代增加的滑动平均值为m 1。利用数据位宽n 1对待量化数据进行量化,获得量化后的定点数。根据量化前的数据和对应的量化后的数据,确定量化误差diff bit,根据量化误差diff bit与阈值的比较结果,将数据位宽n 1调整为n 2,数据位宽调整了|n 1-n 2|位,第t个预判时间点量化时使用的数据位宽为n 2。为了忽略由数据位宽的变化而导致的点位置参数的变化,在确定M (t)时可以选出下述两种优化方式中的其中一种即可。第一种方式:如果数据位宽增加了 |n 1-n 2|位,则s (t-1)取值为s 1-|n 1-n 2|,M (t-1)取值为m 1-|n 1-n 2|,将s (t-1)、M (t-1)代入公式(1-7)中,获得M (t),即为第t个预判时间点对应的点位置参数随着训练迭代增加的滑动平均值。如果数据位宽减少了|n 1-n 2|位,则s (t-1)取值为s 1+|n 1-n 2|,M (t-1)取值为m 1+|n 1-n 2|,将s (t-1)、M (t-1)代入公式(1-7)中,获得M (t),即为第t个预判时间点对应的点位置参数随着训练迭代增加的滑动平均值。第二种方式:不管数据位宽是增加了|n 1-n 2|位还是减少了|n 1-n 2|,s (t-1)取值为s 1,M (t-1)取值为m 1,将s (t-1)、M (t-1)代入公式(1-7)中,获得M (t)。在数据位宽增加|n 1-n 2|位时,将M (t)减去|n 1-n 2|,在数据位宽减少|n 1-n 2|位时,将M (t)加上|n 1-n 2|,结果作为第t个预判时间点对应的点位置参数随着训练迭代增加的滑动平均值。这两种方式是等价的,均可以忽略由数据位宽的变化而导致的点位置参数的变化,获得更为精准的目标迭代间隔,从而提高人工智能处理器芯片的资源利用率。
在实际应用中,数据位宽n和点位置参数s对量化影响很大,量化参数中的缩放系数f以及偏移量O对量化影响不大。所以,不管数据位宽n是否发生变化、点位置参数s可变的情况下,确定点位置参数s的目标迭代间隔也是一件非常有意义的事情。
在一种可能的实现方式中,确定目标迭代间隔的过程可以包括以下步骤:
在预判时间点,确定权值迭代过程中涉及的待量化数据对应点位置参数的变化趋势值;其中,预判时间点是用于判断是否需要对量化参数进行调整的时间点,预判时间点对应权值更新迭代完成时的时间点;
根据点位置参数的变化趋势值确定对应目标迭代间隔。
在该实现方式中,量化参数优选为点位置参数。
需要说明的是,关于上述确定数据位宽的目标迭代间隔和量化参数的目标迭代间隔均仅仅是例举的部分情况,本领域技术人员在理解本申请技术方案的精髓的情况下,可能会在本申请技术方案的基础上产生其它的变形或者变换,本公开对此不作限制。
利用上述方法确定量化参数,根据量化误差对数据位宽或量化参数进行调整,并确定了对数据位宽或量化参数是否调整的目标迭代间隔,达到神经网络运算过程中在适合的时间点对数据位宽或量化参数进行调整,使得在合适的迭代时间点使用合适的量化参数,实现人工智能处理器芯片执行神经网络运算达到定点运算的速度,提升了人工智能处理器芯片的峰值算力的同时满足运算所需的浮点运算的精度。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
进一步需要说明的是,虽然图2-1的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-1中的至少一部分步骤可以包括多个子步 骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本公开实施例还提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令被处理器执行时实现上述神经网络的数据量化处理方法。
图2-5示出根据本公开一实施例的神经网络的数据量化处理装置的框图。如图2-5所示,该装置应用于如图1所示的处理器100,该装置包括数据统计模块61、量化参数确定模块62和量化处理模块63。其中,某一个处理单元101中设置有数据统计模块61、量化参数确定模块62和量化处理模块63。或者,数据统计模块61、量化参数确定模块62和量化处理模块63分别设置在不同的处理单元101中。存储单元102用于存储待量化数据、统计结果、量化参数、数据位宽等与数据统计模块61、量化参数确定模块62和量化处理模块63的运行相关的数据。
数据统计模块61,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果。量化参数确定模块62,根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数。量化处理模块63,利用对应的量化参数对待量化数据进行量化。其中,待量化数据包括神经网络的神经元、权值、梯度中的至少一种数据,量化参数包括点位置参数、缩放系数和偏移量。
在一种可能的实现方式中,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
在一种可能的实现方式中,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同。对应层的所有通道中的神经元对应的量化参数和数据位宽相同。对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
在一种可能的实现方式中,神经网络运算过程可以包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
在一种可能的实现方式中,统计结果可以包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一。其中,绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
在一种可能的实现方式中,缩放系数可以是根据点位置参数、统计结果、数据位宽确定的。
在一种可能的实现方式中,偏移量可以是根据每种待量化数据的统计结果确定的。
在一种可能的实现方式中,点位置参数可以是根据统计结果和数据位宽确定的。
在一种可能的实现方式中,数据位宽可以为预设值。
在一种可能的实现方式中,该装置还可以包括:位宽调整模块,根据数据位宽对应的量化误差,对数据位宽进行调整,以利用调整后的数据位宽确定量化参数。其中,量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
在一种可能的实现方式中,位宽调整模块,可以包括:调整子模块,对量化误差与阈值进行比较,根据比较结果调整数据位宽。其中,阈值包括第一阈值和第二阈值中的至少一个。
在一种可能的实现方式中,对量化误差与阈值进行比较,根据比较结果调整数据位宽,可以包括以下任一项:在量化误差大于或等于第一阈值时,增加数据位宽;在量化误差小于或等于第二阈值时,减少数据位宽;在量化误差处于第一阈值和第二阈值之间时,数据位宽保持不变。
在一种可能的实现方式中,装置还可以包括反量化处理模块和量化误差确定模块。
反量化处理模块,对量化后的数据进行反量化,获得反量化数据,其中,反量化数据的数据格式与对应的量化前的数据的数据格式相同。
量化误差确定模块,根据量化后的数据以及对应的反量化数据确定量化误差。
在一种可能的实现方式中,量化前的数据可以是待量化数据。
在一种可能的实现方式中,量化前的数据可以是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据。其中,目标迭代间隔可以包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的数据位宽。
在一种可能的实现方式中,目标迭代间隔可以是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的。或者目标迭代间隔可以是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的。其中,预判时间点可以是用于判断是否需要对数据位宽进行调整的时间点,预判时间点对应权值更新迭代完成时的时间点。
在一种可能的实现方式中,点位置参数的变化趋势值可以是根据当前预判时间点对应的点位置参数的滑动平均值、上一预判时间点对应的点位置参数的滑动平均值确定的。或者,点位置参数的变化趋势值可以是根据当前预判时间点对应的点位置参数、上一预判时间点对应的点位置参数的滑动平均值确定的。其中,数据位宽的变化趋势值可以是根据对应量化误差确定的。
在一种可能的实现方式中,该装置还可以包括第一滑动平均值确定模块。该第一滑动平均值确定模块被配置为:根据上一预判时间点对应的点位置参数与数据位宽的调整值确定当前预判时间点对应的点位置参数;根据数据位宽的调整值对上一预判时间点对应的点位置参数的滑动平均值进行调整,获得调整结果;根据当前预判时间点对应的点位置参数、调整结果确定当前预判时间点对应的点位置参数的滑动平均值。
在一种可能的实现方式中,该装置还可以包括第二滑动平均值确定模块。该第二滑动平均值确定模块被配置为:根据上一预判时间点对应的点位置参数与上一预判时间点对应的点位置参数的滑动平均值确定当前预判时间点对应的点位置参数的滑动平均值的中间结果;根据当前预判时间点对应的点位置参数的滑动平均值的中间结果与数据位宽的调整值确定当前预判时间点对应的点位置参数的滑动平均值。
在一种可能的实现方式中,量化前的数据可以是在目标迭代间隔内的权值更新迭代时涉及的待量化数据。其中,目标迭代间隔可以包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的量化参数。其中,目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的。预判时间点是用于判断是否需要对量化参数进行调整的时间点,预判时间点对应权值更新迭代完成时的时间点。
本公开实施例所提供的神经网络的数据量化处理装置,利用对应的量化参数对待量化数据进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求,使神经网络模型可以应用于算力、体积、功耗相对受限的手机等终端。
依据以下条款可更好地理解前述内容:
条款A1.一种神经网络的数据量化处理方法,应用于处理器,所述方法包括:
按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;
根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
利用对应的量化参数对所述待量化数据进行量化,
其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
条款A2.根据条款A1所述的方法,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
条款A3.根据条款A1所述的方法,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同,对应层的所有通道中的神经元对应的量化参数和数据位宽相同,对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
条款A4.根据条款A1至条款A3任一项所述的方法,所述神经网络运算过程包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
条款A5.根据条款A1至条款A3任一项所述的方法,所述统计结果包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一,
其中,所述绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
条款A6.根据条款A1至条款A3任一项所述的方法,所述缩放系数是根据所述点位置参数、所述统计结果、所述数据位宽确定的。
条款A7.根据条款A1至条款A3任一项所述的方法,所述偏移量是根据每种待量化数据的统计结果确定的。
条款A8.根据条款A1至条款A3任一项所述的方法,所述点位置参数是根据所述统计结果和所述数据位宽确定的。
条款A9.根据条款A1至条款A3任一项所述的方法,所述数据位宽为预设值。
条款A10.根据条款A1至条款A3任一项所述的方法,所述方法还包括:
根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,以利用调整后的数据位宽确定量化参数,
其中,所述量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
条款A11.根据条款A10所述的方法,根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,包括:
对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,
其中,所述阈值包括第一阈值和第二阈值中的至少一个。
条款A12.根据条款A11所述的方法,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,包括以下任一项:
在所述量化误差大于或等于所述第一阈值时,增加所述数据位宽;
在所述量化误差小于或等于所述第二阈值时,减少所述数据位宽;
在所述量化误差处于所述第一阈值和所述第二阈值之间时,所述数据位宽保持不变。
条款A13.根据条款A10所述的方法,所述方法还包括:
对量化后的数据进行反量化,获得反量化数据,其中,所述反量化数据的数据格式与对应的量化前的数据的数据格式相同;
根据所述量化后的数据以及对应的反量化数据确定所述量化误差。
条款A14.根据条款A10所述的方法,所述量化前的数据是所述待量化数据。
条款A15.根据条款A10所述的方法,所述量化前的数据是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据;
其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述数据位宽。
条款A16.根据条款A15所述的方法,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
或者所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的,
其中,所述预判时间点是用于判断是否需要对所述数据位宽进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
条款A17.根据条款A16所述的方法,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数的滑动平均值、上一预判时间点对应的点位置参数的滑动平均值确定的,
或者,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数、上一预判时间点对应的点位置参数的滑动平均值确定的,
其中,所述数据位宽的变化趋势值是根据对应所述量化误差确定的。
条款A18.根据条款A17所述的方法,所述当前预判时间点对应的点位置参数的滑动平均值的确定步骤包括:
根据上一预判时间点对应的点位置参数与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数;
根据所述数据位宽的调整值对所述上一预判时间点对应的点位置参数的滑动平均值进行调整,获得调整结果;
根据所述当前预判时间点对应的点位置参数、所述调整结果确定当前预判时间点对应的点位置参数的滑动平均值。
条款A19.根据条款A17所述的方法,所述当前预判时间点对应的点位置参数的滑动平均值的确定步骤包括:
根据上一预判时间点对应的点位置参数与上一预判时间点对应的点位置参数的滑动平均值确定当前预判时间点对应的点位置参数的滑动平均值的中间结果;
根据当前预判时间点对应的点位置参数的滑动平均值的中间结果与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数的滑动平均值。
条款A20.根据条款A10所述的方法,所述量化前的数据是在目标迭代间隔内的权值更新迭代时涉及的待量化数据;其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述量化参数,
其中,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
所述预判时间点是用于判断是否需要对所述量化参数进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
条款A21.一种神经网络的数据量化处理装置,应用于处理器,所述装置包括:
数据统计模块,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每 种待量化数据的统计结果;
量化参数确定模块,根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
量化处理模块,利用对应的量化参数对所述待量化数据进行量化,
其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
条款A22.根据条款A21所述的装置,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
条款A23.根据条款A21所述的装置,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同,对应层的所有通道中的神经元对应的量化参数和数据位宽相同,对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
条款A24.根据条款A21至条款A23任一项所述的装置,所述神经网络运算过程包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
条款A25.根据条款A21至条款A23任一项所述的装置,所述统计结果包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一,
其中,所述绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
条款A26.根据条款A21至条款A23任一项所述的装置,所述缩放系数是根据所述点位置参数、所述统计结果、所述数据位宽确定的。
条款A27.根据条款A21至条款A23任一项所述的装置,所述偏移量是根据每种待量化数据的统计结果确定的。
条款A28.根据条款A21至条款A23任一项所述的装置,所述点位置参数是根据所述统计结果和所述数据位宽确定的。
条款A29.根据条款A21至条款A23任一项所述的装置,所述数据位宽为预设值。
条款A30.根据条款A21至条款A23任一项所述的装置,所述装置还包括:
位宽调整模块,根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,以利用调整后的数据位宽确定量化参数,
其中,所述量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
条款A31.根据条款A30所述的装置,所述位宽调整模块,包括:
调整子模块,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,
其中,所述阈值包括第一阈值和第二阈值中的至少一个。
条款A32.根据条款A31所述的装置,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,包括以下任一项:
在所述量化误差大于或等于所述第一阈值时,增加所述数据位宽;
在所述量化误差小于或等于所述第二阈值时,减少所述数据位宽;
在所述量化误差处于所述第一阈值和所述第二阈值之间时,所述数据位宽保持不变。
条款A33.根据条款A30所述的装置,所述装置还包括:
反量化处理模块,对量化后的数据进行反量化,获得反量化数据,其中,所述反量化数据的数据格式与对应的量化前的数据的数据格式相同;
量化误差确定模块,根据所述量化后的数据以及对应的反量化数据确定所述量化误差。
条款A34.根据条款A30所述的装置,所述量化前的数据是所述待量化数据。
条款A35.根据条款A30所述的装置,所述量化前的数据是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据;
其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述数据位宽。
条款A36.根据条款A35所述的装置,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
或者所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的,
其中,所述预判时间点是用于判断是否需要对所述数据位宽进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
条款A37.根据条款A36所述的装置,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数的滑动平均值、上一预判时间点对应的点位置参数的滑动平均值确定的,
或者,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数、上一预判时间点对应的点位置参数的滑动平均值确定的,
其中,所述数据位宽的变化趋势值是根据对应所述量化误差确定的。
条款A38.根据条款A37所述的装置,所述装置还包括:
第一滑动平均值确定模块,根据上一预判时间点对应的点位置参数与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数;
根据所述数据位宽的调整值对所述上一预判时间点对应的点位置参数的滑动平均值进行调整,获得调整结果;
根据所述当前预判时间点对应的点位置参数、所述调整结果确定当前预判时间点对应的点位置参数的滑动平均值。
条款A39.根据条款A37所述的装置,所述装置还包括:
第二滑动平均值确定模块,根据上一预判时间点对应的点位置参数与上一预判时间点对应的点位置参数的滑动平均值确定当前预判时间点对应的点位置参数的滑动平均值的中间结果;
根据当前预判时间点对应的点位置参数的滑动平均值的中间结果与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数的滑动平均值。
条款A40.根据条款A30所述的装置,所述量化前的数据是在目标迭代间隔内的权值更新迭代时涉及的待量化数据;其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述量化参数,
其中,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
所述预判时间点是用于判断是否需要对所述量化参数进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
条款A41.一种人工智能芯片,所述芯片包括如条款A21至条款A40中任意一项所述的神经网络的数据量化处理装置。
条款A42.一种电子设备,所述电子设备包括如条款A41所述的人工智能芯片。
条款A43.一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款A41所述的人工 智能芯片;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
所述存储器件,用于存储数据;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
所述控制器件,用于对所述人工智能芯片的状态进行监控。
条款A44.根据条款A43所述的板卡,
所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;
所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;
所述接口装置为:标准PCIE接口。
条款A45.一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现条款A1至条款A20中任意一项所述的神经网络的数据量化处理方法。
随着人工智能算法的复杂度提高,待处理数据的数据量和数据维度也在不断增大,而传统的循环神经网络算法通常采用浮点数据格式来执行循环神经网络运算,这就使得不断增大的数据量等对运算装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。为解决上述问题,可以对循环神经网络运算过程涉及的运算数据进行量化,即将浮点表示的运算数据转化为定点表示的运算数据,从而减小存储装置的存储容量及访存效率,并提高运算装置的运算效率。但传统的量化方法是在循环神经网络的整个训练过程中均采用相同的数据位宽和量化参数(如小数点的位置)对循环神经网络的不同运算数据进行量化,由于不同的运算数据之间具有差异性,或者,训练过程中不同阶段的运算数据具有差异性,使得采用上述量化方法进行量化时,往往会导致精度不够,从而会影响运算结果。
基于此,本公开提供了一种循环神经网络的量化参数调整方法,该方法可以应用于包含存储器110和处理器120的量化参数调整装置中。图3-1为该量化参数调整装置100’的结构框图,其中,该量化参数调整装置100’的处理器120可以是通用处理器,该量化参数调整装置100’的处理器120也可以是人工智能处理器,该量化参数调整装置100的处理器还可以包含通用处理器和人工智能处理器,此处不作具体限定。该存储器110可以用于存储循环神经网络运算过程中的运算数据,该运算数据可以是神经元数据、权值数据或梯度数据中的一种或多种。该存储器110还可以用于存储计算机程序,该计算机程序被上述的处理器120执行时,能够实现本公开实施例中的量化参数调整方法。该方法能够应用于循环神经网络的训练或微调过程中,并根据循环神经网络的训练或微调过程的不同阶段的运算数据的分布特性,动态地调整运算数据的量化参数,从而提高循环神经网络的量化过程的精度,进而保证运算结果的准确性和可靠性。
若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储器可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)或混合存储立方HMC(Hybrid Memory Cube)等等。
为更好地理解本公开的内容,以下首先介绍本公开实施例中量化过程及量化过程涉及到的量化参数。
本公开实施例中,量化是指将第一数据格式的运算数据转化为第二数据格式的运算数据。其中,该第一数据格式的运算数据可以是浮点表示的运算数据,该第二数据格式的运算数据可以是定点表示的运算数据。由于浮点表示的运算数据通常会占用较大的存储空间,因此通过将浮点表示的运算数据转换为定点表示的运算数据,可以节约存储空间,并提高运算数据的访存效率及运算效率等。
可选地,量化过程中的量化参数可以包括点位置和/或缩放系数,其中,点位置是指量化后的运算数据中小数点的位置。缩放系数是指量化数据的最大值与待量化数据的最大绝对值之间的比值。进一步地,量化参数还可以包括偏移量,偏移量是针对非对称的待量化数据而言,是指该待量化数据中多个元素的中间值,具体地,偏移量可以是待量化数据中多个元素的中点值。当该待量化数据为对称的待量化数据时,量化参数可以不包含偏移量,此时可以根据该待量化数据确定点位置和/或缩放系数等量化参数。
图3-2示出本公开一实施例的待量化数据和量化数据的对应关系示意图,如图3-2所示,待量化数据为相对于原点对称的数据,假设Z 1为待量化数据中元素的绝对值的最大值,待量化数据对应的数据位宽为n,A为用数据位宽n对待量化数据进行量化后的量化数据可以表示的最大值,A为2 s(2 n-1-1),A需要包含Z 1,且Z 1要大于
Figure PCTCN2020095679-appb-000017
因此有公式(2-1)的约束:
2 s(2 n-1-1)≥Z 1>2 s-1(2 n-1-1)   公式(2-1)
处理器可以根据待量化数据中的绝对值最大值Z1和数据位宽n,计算得到点位置s。例如,可以利用如下公式(2-2)计算得到待量化数据对应的点位置s:
Figure PCTCN2020095679-appb-000018
其中,ceil为向上取整,Z 1为待量化数据中的绝对值最大值,s为点位置,n为数据位宽。
此时,当采用点位置s对待量化数据进行量化时,浮点表示的待量化数据F x可以表示为:F x≈I x×2 s,其中,I x是指量化后的n位二进制表示值,s表示点位置。其中,该待量化数据对应的量化数据为:
Figure PCTCN2020095679-appb-000019
其中,s为点位置,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-3)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,根据点位置量化得到的量化数据中,小数点后的位数越多,量化数据的量化精度越大。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000020
其中,s为根据上述公式(2-2)确定的点位置,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,缩放系数可以包括第一缩放系数,该第一缩放系数可以按照如下方式进行计 算:
Figure PCTCN2020095679-appb-000021
其中,Z1为待量化数据中的绝对值最大值,A为用数据位宽n对待量化数据进行量化后数据的量化数据可以表示的最大值,A为2 s(2 n-1-1)。
此时,处理器可以采用点位置和第一缩放系数结合的方式对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020095679-appb-000022
其中,s为根据上述公式(2-2)确定的点位置,f 1为第一缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-6)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000023
其中,s为根据上述公式(2-2)确定的点位置,f 1为缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,该缩放系数还可以包括第二缩放系数,该第二缩放系数可以按照如下方式进行计算:
Figure PCTCN2020095679-appb-000024
处理器可以单独使用第二缩放系数对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020095679-appb-000025
其中,f 2为第二缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-9)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的缩放系数,可以调整量化后数据的数值范围。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000026
其中,f 2为第二缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步地,上述第二缩放系数可以是根据点位置和第一缩放系数f 1确定。即第二缩放系数可以按照如下公式进行计算:
f 2=2 s×f 1    公式(2-11)
其中,s为根据上述公式(2-2)确定的点位置,f 1是按照上述公式(2-5)计算获得的第一缩放系数。
可选地,本公开实施例的量化方法不仅能够实现对称数据的量化,还可以实现对非对称数据的量化。此时,处理器可以将非对称的数据转化为对称数据,以避免数据的“溢出”。具体地,量化参数还可以包括偏移量,该偏移量可以是待量化数据的中点值,该偏移量可以用于表示待量化数据的中点值相对于原点的偏移量。图3-3示出本公开一实施例的待量化数据的转换示意图,如图3-3所示,处理器可以对待量化数据的数据分布进行统计,获得待量化数据中所有元素中的最小值Z min,以及该待量化数据中所有元素中的最大值Z max,之后处理器可以根据该最小值Z min和最大值Z max计算获得上述偏移量。具体的偏移量计算方式如下:
Figure PCTCN2020095679-appb-000027
其中,o表示偏移量,Z min表示待量化数据所有元素中的最小值,Z max表示待量化数据所有元素中的最大值。
进一步地,处理器可以根据该待量化数据所有元素中的最小值Z min和最大值Z max确定该待量化数据中的绝对值最大值Z 2
Figure PCTCN2020095679-appb-000028
这样,处理器可以根据偏移量o将待量化数据进行平移,将非对称的待量化数据转化为对称的待量化数据,如图3-3所示。处理器还可以根据该待量化数据中的绝对值最大值Z 2进一步确定点位置s,其中,点位置可以按照如下公式进行计算:
Figure PCTCN2020095679-appb-000029
其中,ceil为向上取整,s为点位置,n为数据位宽。
之后,处理器可以根据该偏移量及其对应的点位置对待量化数据进行量化,获得量化数据:
Figure PCTCN2020095679-appb-000030
其中,s为根据上述公式(2-14)确定的点位置,o为偏移量,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-15)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000031
其中,s为根据上述公式(2-14)确定的点位置,o为偏移量,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步可选地,处理器可以根据该待量化数据中的绝对值最大值Z 2进一步确定点位置s和第一缩放系数f 1,其中,点位置s具体计算方式可参见上述公式(2-14)。第一缩放系数f 1可以按照如下公式进行计算:
Figure PCTCN2020095679-appb-000032
处理器可以根据偏移量及其对应的第一缩放系数f 1和点位置s,对待量化数据进行量化,获得量化数据:
Figure PCTCN2020095679-appb-000033
其中,f 1为第一缩放系数,s为根据上述公式(2-14)确定的点位置,o为偏移量,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-18)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000034
其中,f 1为第一缩放系数,s为根据上述公式(2-14)确定的点位置,o为偏移量,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,该缩放系数还可以包括第二缩放系数,该第二缩放系数可以按照如下方式进行计算:
Figure PCTCN2020095679-appb-000035
处理器可以单独使用第二缩放系数对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020095679-appb-000036
其中,f 2为第二缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(2-21)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的缩放系数,可以调整量化后数据的数值范围。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020095679-appb-000037
其中,f 2为第二缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步地,上述第二缩放系数可以根据点位置和第一缩放系数f 1确定。即第二缩放系数可以按照如下公式进行计算:
f 2=2 s×f 1    公式(2-23)
其中,s为根据上述公式(2-14)确定的点位置,f 1是按照上述公式(2-17)计算获得的第一缩放系数。
可选地,处理器还可以根据偏移量o对待量化数据进行量化,此时点位置s和/或缩放系数可以为预设值。此时,处理器根据偏移量对待量化数据进行量化,获得量化数据:
I x=round(F x-o)    公式(2-24)
其中,o为偏移量,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替 换公式(2-24)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的偏移量,可以调整量化后数据的数值与量化前数据之间的偏移量。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
F x1=round(F x-o)+o   公式(2-25)
其中,o为偏移量,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
本公开的量化操作不仅可以用于上述浮点数据的量化,还可以用于实现定点数据的量化。可选地,该第一数据格式的运算数据也可以是定点表示的运算数据,该第二数据格式的运算数据可以是定点表示的运算数据,且第二数据格式的运算数据的数据表示范围小于第一数据格式的数据表示范围,第二数据格式的小数点位数大于第一数据格式的小数点位数,即第二数据格式的运算数据相较于第一数据格式的运算数据具有更高的精度。例如,该第一数据格式的运算数据为占用16位的定点数据,该第二数据格式可以是占用8位的定点数据。本公开实施例中,可以通过定点表示的运算数据进行量化处理,从而进一步减小运算数据占用的存储空间,提高运算数据的访存效率及运算效率。
本公开一实施例的循环神经网络的量化参数调整方法,能够应用于循环神经网络的训练或微调过程中,从而在循环神经网络的训练或微调过程中,动态地调整循环神经网络运算过程中运算数据的量化参数,以提高该循环神经网络的量化精度。其中,循环神经网络可以是深度循环神经网络或卷积循环神经网络等等,此处不作具体限定。
应当清楚的是,循环神经网络的训练(Training)是指对循环神经网络(该循环神经网络的权值可以是随机数)进行多次迭代运算(iteration),使得循环神经网络的权值能够满足预设条件的过程。其中,一次迭代运算一般包括一次正向运算、一次反向运算和一次权值更新运算。正向运算是指根据循环神经网络的输入数据进行正向推理,获得正向运算结果的过程。反向运算是根据正向运算结果与预设参考值确定损失值,并根据该损失值确定权值梯度值和/或输入数据梯度值的过程。权值更新运算是指根据权值梯度值调整循环神经网络的权值的过程。具体地,循环神经网络的训练过程如下:处理器可以采用权值为随机数的循环神经网络对输入数据进行正向运算,获得正向运算结果。之后处理器根据该正向运算结果与预设参考值确定损失值,根据该损失值确定权值梯度值和/或输入数据梯度值。最后,处理器可以根据权值梯度值更新循环神经网络的梯度值,获得新的权值,完成一次迭代运算。处理器循环执行多次迭代运算,直至循环神经网络的正向运算结果满足预设条件。例如,当循环神经网络的正向运算结果收敛于预设参考值时,结束训练。或者,当循环神经网络的正向运算结果与预设参考值确定的损失值小于或等于预设精度时,结束训练。
微调是指对循环神经网络(该循环神经网络的权值已经处于收敛状态而非随机数)进行多次迭代运算,以使得循环神经网络的精度能够满足预设需求的过程。该微调过程与上述训练过程基本一致,可以认为是对处于收敛状态的循环神经网络进行重训练的过程。推理(Inference)是指采用权值满足预设条件的循环神经网络进行正向运算,从而实现识别或分类等功能的过程,如采用循环神经网络进行图像识别等等。
本公开实施例中,在上述循环神经网络的训练或微调过程中,可以在循环神经网络运算的不同阶段采用不同的量化参数对循环神经网络的运算数据进行量化,并根据量化后的数据进行迭代运算,从而可以减小循环神经网络运算过程中的数据存储空间,提高数据访存效率及运算效率。图3-4示出本公开一实施例的循环神经网络的量化参数调整方法的流程图。如图3-4所示,上述方法可以包括步骤S100至步骤S200。
在步骤S100中,获取待量化数据的数据变动幅度。
可选地,处理器可以直接读取该待量化数据的数据变动幅度,该待量化数据的数据变动幅度可以 是用户输入的。
可选地,处理器也可以根据当前检验迭代的待量化数据和历史迭代的待量化数据,计算获得上述的待量化数据的数据变动幅度,其中当前检验迭代是指当前执行的迭代运算,历史迭代是指在当前检验迭代之前执行的迭代运算。例如,处理器可以获取当前检验迭代的待量化数据中元素的最大值以及元素的平均值,以及各个历史迭代的待量化数据中元素的最大值以及元素的平均值,并根据各次迭代中元素的最大值和元素的平均值,确定待量化数据的变动幅度。若当前检验迭代的待量化数据中元素的最大值与预设数量的历史迭代的待量化数据中元素的最大值较为接近,且当前检验迭代的待量化数据中元素的平均值与预设数量的历史迭代的待量化数据中元素的平均值较为接近时,则可以确定上述的待量化数据的数据变动幅度较小。否则,则可以确定待量化数据的数据变动幅度较大。再如,该待量化数据的数据变动幅度可以采用待量化数据的滑动平均值或方差等进行表示,此处不作具体限定。
本公开实施例中,该待量化数据的数据变动幅度可以用于确定是否需要调整待量化数据的量化参数。例如,若待量化数据的数据变动幅度较大,则可以说明需要及时调整量化参数,以保证量化精度。若待量化数据的数据变动幅度较小,则当前检验迭代及其之后一定数量的迭代可以沿用历史迭代的量化参数,从而可以避免频繁的调整量化参数,提高量化效率。
其中,每次迭代涉及至少一个待量化数据,该待量化数据可以是浮点表示的运算数据,也可以是定点表示的运算数据。可选地,每次迭代的待量化数据可以是神经元数据、权值数据或梯度数据中的至少一种,梯度数据还可以包括神经元梯度数据和权值梯度数据等。
在步骤S200中,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
可选地,该量化参数可以包括上述的点位置和/或缩放系数,其中,缩放系数可以包括第一缩放系数和第二缩放系数。具体的点位置计算方法可以参见上述的公式(2-2),缩放系数的计算方法可参见上述的公式(2-5)或(2-8),此处不再赘述。可选地,该量化参数还可以包括偏移量,该偏移量的计算方法可参见上述的公式(2-12);更进一步地,处理器还可以根据按照公式(2-14)确定点位置,根据上述的公式(2-17)或(2-20)确定缩放系数。本申请实施例中,处理器可以根据确定的目标迭代间隔,更新上述的点位置、缩放系数或偏移量中的至少一种,以调整该循环神经网络运算中的量化参数。也就是说,该循环神经网络运算中的量化参数可以根据循环神经网络运算中待量化数据的数据变动幅度进行更新,从而可以保证量化精度。
可以理解的是,通过对循环神经网络训练或微调过程中的运算数据的变化趋势进行统计和分析,可以得到待量化数据的数据变动曲线。图3-5a示出本公开一实施例的待量化数据在运算过程中的变动趋势图,如图3-5a所示,根据该数据变动曲线可以获知,在循环神经网络训练或微调的初期,不同迭代的待量化数据的数据变动较为剧烈,随着训练或微调运算的进行,不同迭代的待量化数据的数据变动逐渐趋于平缓。因此,在循环神经网络训练或微调的初期,可以较为频繁地调整量化参数;在循环神经网络训练或微调的中期和后期,可以间隔多次迭代或周期再调整量化参数。本公开的方法即是通过确定合适的迭代间隔,以取得量化精度和量化效率的平衡。
具体地,处理器可以通过待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据该第一目标迭代间隔调整循环神经网络运算中的量化参数。可选地,该第一目标迭代间隔可以随着待量化数据的数据变动幅度的减小而增大。也就是说,该待量化数据的数据变动幅度越大时,则该第一目标迭代间隔越小,表明量化参数的调整越频繁。该待量化数据的数据变动幅度越小时,则该第一目标迭代间隔越大,表明量化参数的调整越不频繁。当然,在其他实施例中,上述的第一目标迭代间隔还可以是超参数,例如,该第一目标迭代间隔可以是用户自定义设置的。
可选地,上述的权值数据、神经元数据及梯度数据等各种待量化数据可以分别具有的不同的迭代间隔。相应地,处理器可以分别获取各种待量化数据对应的数据变动幅度,以分别根据每种待量化数据的数据变动幅度,确定相应种类的待量化数据对应的第一目标迭代间隔。也就是说,各种待量化数据的量化过程可以是异步进行的。本公开实施例中,由于不同种类的待量化数据之间具有差异性,因 此可以采用不同的待量化数据的数据变动幅度,确定相应的第一目标迭代间隔,并分别根据相应的第一目标迭代间隔确定对应的量化参数,从而可以保证待量化数据的量化精度,进而保证循环神经网络的运算结果的正确性。
当然,在其他实施例中,针对不同种类的待量化数据还可以确定相同的目标迭代间隔(包括第一目标迭代间隔、预设迭代间隔、第二目标迭代间隔中的任一个),以根据该目标迭代间隔调整相应待量化数据对应的量化参数。例如,处理器可以分别获取各种待量化数据的数据变动幅度,并根据最大的待量化数据的数据变动幅度确定目标迭代间隔,并根据该目标迭代间隔分别确定各种待量化数据的量化参数。更进一步地,不同种类的待量化数据还可以采用相同的量化参数。
进一步可选地,上述的循环神经网络可以包括至少一个运算层,该待量化数据可以是各个运算层涉及的神经元数据、权值数据或梯度数据中的至少一种。此时,处理器可以获得当前运算层涉及的待量化数据,并根据上述方法确定当前运算层中各种待量化数据的数据变动幅度及对应的第一目标迭代间隔。
可选地,处理器可以在每次迭代运算过程中均确定一次上述的待量化数据的数据变动幅度,并根据相应的待量化数据的数据变动幅度确定一次第一目标迭代间隔。也就是说,处理器可以在每次迭代均计算一次第一目标迭代间隔。具体的第一目标迭代间隔的计算方式可参见下文中的描述。进一步地,处理器可以根据预设条件从各次迭代中选定检验迭代,在各次检验迭代处确定待量化数据的变动幅度,并根据检验迭代对应的第一目标迭代间隔对量化参数等的更新调整。此时,若该迭代不是选定检验迭代,处理器可以忽略该迭代对应的第一目标迭代间隔。
可选地,每个目标迭代间隔可以对应一检验迭代,该检验迭代可以是该目标迭代间隔的起始迭代,也可以是该目标迭代间隔的终止迭代。处理器可以在各个目标迭代间隔的检验迭代处调整循环神经网络的量化参数,以实现根据目标迭代间隔调整循环神经网络运算的量化参数。其中,检验迭代可以是用于核查当前量化参数是否满足待量化数据的需求的时间点。该调整前的量化参数可以与调整后的量化参数相同,也可以与调整后的量化参数不同。可选地,相邻的检验迭代之间的间隔可以大于或等于一个目标迭代间隔。
例如,该目标迭代间隔可以从当前检验迭代开始计算迭代数量,该当前检验迭代可以是该目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定目标迭代间隔的迭代间隔为3,则处理器可以确定该目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。处理器可以在该第100次迭代处调整循环神经网络运算中的量化参数。其中,当前检验迭代是处理器当前执行量化参数更新调整时对应的迭代运算。
可选地,目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,该当前检验迭代可以是当前检验迭代之前的上一迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定目标迭代间隔的迭代间隔为3,则处理器可以确定该目标迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。处理器可以在该第100次迭代和第103次迭代处调整循环神经网络运算中的量化参数。本公开对目标迭代间隔的确定方式不作具体限定。
图3-5b示出本公开一实施例的循环神经网络的展开示意图。如图3-5b所示,给出了循环神经网络的隐藏层的展开示意图,t-1,t,t+1表示时间序列。X表示输入的样本。St表示样本在时间t处的记忆,St=f(W*St-1+U*Xt)。W表示输入的权重,U表示此刻输入的样本的权重,V表示输出的样本权重。由于不同的循环神经网络展开的层数不同,不同在对循环神经网络进行量化参数更新时,不同周期中所包含的迭代总数是不同的。图3-5c示出本公开一实施例的循环神经网络的周期示意图。如图3-5c所示,iter 1、iter 2、iter 3、iter 4为循环神经网络的三个周期,其中,第一个周期iter 1中包括t 0、t 1、t 2、t 3四个迭代。第二个周期iter 2中包括t 0、t 1两个迭代。第三个周期iter 3中包括t 0、t 1、t 2三个迭代。第四个周期iter 2中包括t 0、t 1、t 2、t 3、t 4五个迭代。在计算循环神经网络在何时能够更新量化参数时,需要结合不同周期中迭代的总数进行计算。
在一个实施例中,根据上文中点位置、缩放系数及偏移量的计算公式可以看出,量化参数往往与 待量化数据相关,因此,上述操作S100中,待量化数据的数据变动幅度也可以通过量化参数的变动幅度间接确定,该待量化数据的数据变动幅度可以通过量化参数的变动幅度进行表征。具体地,图3-6示出本公开一实施例的循环神经网络的参数调整方法的流程图,如图3-6所示,上述操作S100可以包括操作S110,操作S200可以包括操作S210(详见下文描述)。
S110、获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
可选地,点位置的变动幅度能够间接反应待量化数据的变动幅度。该点位置的变动幅度可以是根据当前检验迭代的点位置和至少一次历史迭代的点位置确定的。其中,当前检验迭代的点位置及各次历史迭代的点位置可以根据公式(2-2)进行确定。当然,当前检验迭代的点位置及各次历史迭代的点位置还可以根据公式(2-14)进行确定。
例如,处理器还可以计算当前检验迭代的点位置和历史迭代的点位置的方差等,并根据该方差确定点位置的变动幅度。再如,处理器可以根据当前检验迭代的点位置和历史迭代的点位置的平均值,确定点位置的变动幅度。具体地,如图3-7所示,上述操作S110可以包括操作S111至操作S113,操作S210可以包括操作S211(详见下文描述)。
S111、根据所述当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值。其中,上一检验迭代为上一次调整所述量化参数时对应的迭代,上一检验迭代与所述当前检验迭代之间间隔至少一个迭代间隔。
可选地,至少一次历史迭代可以分属于至少一个迭代间隔中,每个迭代间隔可以对应有一个检验迭代,相邻的两个检验迭代可以具有一个迭代间隔。上述操作S111中的上一检验迭代可以是目标迭代间隔之前的上一迭代间隔对应的检验迭代。
可选地,该第一均值可以按照如下公式进行计算:
M1=a1×s t-1+a2×s t-2+a3×s t-3+…+am×s 1   公式(2-26)
其中,a1~am是指各次迭代的点位置对应的计算权重,s t-1是指上一检验迭代对应的点位置,s t-2,s t-3…s 1是指上一检验迭代之前的历史迭代对应的点位置,M1是指上述的第一均值。进一步地,根据数据的分布特性,历史迭代与该上一检验迭代距离越远,对该上一检验迭代附近的迭代的点位置的分布及变动幅度影响越小,因此,上述计算权重可以按照a1~am的顺序依次减小。
例如,上一检验迭代为循环神经网络运算的第100次迭代,历史迭代可以是第1次迭代至第99次迭代,则处理器可以获得该第100次迭代的点位置(即s t-1),并获得该第100次迭代之前的历史迭代的点位置,即s 1可以指循环神经网络的第1次迭代对应的点位置……,s t-3可以指循环神经网络的第98次迭代对应的点位置,s t-2可以指循环神经网络的第99次迭代对应的点位置。进一步地,处理器可以根据上述公式计算获得第一均值。
更进一步地,该第一均值可以根据各个迭代间隔对应的检验迭代的点位置进行计算。例如,该第一均值可以按照如下公式进行计算:
M1=a1×s t-1+a2×s t-2+a3×s t-3+…+am×s 1
其中,a1~am是指各次检验迭代的点位置对应的计算权重,s t-1是指上一检验迭代对应的点位置,s t-2,s t-3…s 1是指上一检验迭代之前的预设数量的迭代间隔的检验迭代对应的点位置,M1是指上述的第一均值。
例如,上一检验迭代为循环神经网络运算的第100次迭代,历史迭代可以是第1次迭代至第99次迭代,该99次历史迭代可以分属于11个迭代间隔。比如,第1次迭代至第9次迭代属于第1个迭代间隔,第10次迭代至第18次迭代属于第2个迭代间隔,……,第90次迭代至第99次迭代属于第11个迭代间隔。则处理器可以获得该第100次迭代的点位置(即s t-1),并获得该第100次迭代之前的迭代间隔中检验迭代的点位置,即s 1可以指循环神经网络的第1个迭代间隔的检验迭代对应的点位置(比如s 1可以指循环神经网络的第1次迭代对应的点位置),……,s t-3可以指循环神经网络的第10个迭代间隔的检验迭代对应的点位置(比如s t-3可以指循环神经网络的第81次迭代对应的点位置),s t-2可以指循环神经网络的第11个迭代间隔的检验迭代对应的点位置(比如,s t-2可以指循环神经网络的第90次迭代对应 的点位置)。进一步地,处理器可以根据上述公式计算获得第一均值M1。
本公开实施例中,为方便举例说明,假定该迭代间隔包含的迭代数量相同。而在实际使用过程中,如图3-5c所示,该循环神经网络中迭代间隔包含的迭代数量不相同。可选地,该迭代间隔包含的迭代数量随迭代的增加而增加,即随着循环神经网络训练或微调的进行,迭代间隔可以越来越大。
再进一步地,为进一步简化计算,降低数据占用的存储空间,上述第一均值M1可以采用如下公式进行计算:
M1=α×s t-1+(1-α)×M0   公式(2-27)
其中,α是指上一检验迭代对应的点位置的计算权重,s t-1是指上一检验迭代对应的点位置,M0是指该上一检验迭代之前的检验迭代对应的滑动平均值,该M0的具体计算方式可参照上述的M1的计算方式,此处不再赘述。
S112、根据当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值。其中,当前检验迭代对应的点位置可以根据当前检验迭代的目标数据位宽和待量化数据确定。
可选地,该第二均值M2可以按照如下公式进行计算:
M2=b1×s t+b2×s t-1+b3×s t-2+…+bm×s 1   公式(2-28)
其中,b1~bm是指各次迭代的点位置对应的计算权重,s t是指当前检验迭代对应的点位置,s t-1,s t-2…s 1是指当前检验迭代之前的历史迭代对应的点位置,M2是指上述的第二均值。进一步地,根据数据的分布特性,历史迭代与该当前检验迭代距离越远,对该当前检验迭代附近的迭代的点位置的分布及变动幅度影响越小,因此,上述计算权重可以按照b1~bm的顺序依次减小。
例如,当前检验迭代为循环神经网络运算的第101次迭代,该当前检验迭代之前的历史迭代是指第1次迭代至第100次迭代。则处理器可以获得该第101次迭代的点位置(即s t),并获得该第101次迭代之前的历史迭代的点位置,即s 1可以指循环神经网络的第1次迭代对应的点位置……,s t-2可以指循环神经网络的第99次迭代对应的点位置,s t-1可以指循环神经网络的第100次迭代对应的点位置。进一步地,处理器可以根据上述公式计算获得第二均值M2。
可选地,该第二均值可以根据各个迭代间隔对应的检验迭代的点位置进行计算。具体地,图3-8示出本公开一实施例中第二均值的确定方法的流程图,如图3-8所示,上述操作S112可以包括如下操作:
S1121、获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定,所述检验迭代为调整所述神经网络量化过程中的参数时对应的迭代;
S1122、根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
例如,该第二均值可以按照如下公式进行计算:
M2=b1×s t+b2×s t-1+b3×s t-2+…+bm×s 1
其中,b1~bm是指各次迭代的点位置对应的计算权重,s t是指当前检验迭代对应的点位置,s t-1,s t-2…s 1是指当前检验迭代之前的检验迭代对应的点位置,M2是指上述的第二均值。
例如,当前检验迭代为第100次迭代,历史迭代可以是第1次迭代至第99次迭代,该99次历史迭代可以分属于11个迭代间隔。比如,第1次迭代至第9次迭代属于第1个迭代间隔,第10次迭代至第18次迭代属于第2个迭代间隔,……,第90次迭代至第99次迭代属于第11个迭代间隔。则处理器可以获得该第100次迭代的点位置(即s t),并获得该第100次迭代之前的迭代间隔中检验迭代的点位置,即s 1可以指循环神经网络的第1个迭代间隔的检验迭代对应的点位置(比如s 1可以指循环神经网络的第1次迭代对应的点位置),……,s t-2可以指循环神经网络的第10个迭代间隔的检验迭代对应的点位置(比如s t-2可以指循环神经网络的第81次迭代对应的点位置),s t-1可以指循环神经网络的第11个迭代间隔的检验迭代对应的点位置(比如,s t-1可以指循环神经网络的第90次迭代对应的点位置)。进一步地,处理器可以根据上述公式计算获得第二均值M2。
本公开实施例中,为方便举例说明,假定该迭代间隔包含的迭代数量相同。而在实际使用过程中, 该迭代间隔包含的迭代数量可以不相同。可选地,该迭代间隔包含的迭代数量随迭代的增加而增加,即随着循环神经网络训练或微调的进行,迭代间隔可以越来越大。
更进一步地,为简便计算,降低数据占用的存储空间,处理器可以根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值,即上述第二均值可以采用如下公式进行计算:
M2=β×s t+(1-β)×M1   公式(2-29)
其中,β是指当前检验迭代对应的点位置的计算权重,M1是指上述的第一均值。
S113、根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述当前检验迭代及所述历史迭代的点位置的变动幅度。
可选地,第一误差可以等于第二均值与上述的第一均值之间的差值的绝对值。具体地,上述的第一误差可以按照如下公式进行计算:
diff update1=|M2-M1|=β|s (t)-M1|   公式(2-30)
可选地,上述的当前检验迭代的点位置可以根据当前检验迭代的待量化数据和当前检验迭代对应的目标数据位宽确定,具体的点位置计算方式可以参见上文的公式(2-2)或公式(2-14)。其中,上述当前检验迭代对应的目标数据位宽可以是超参数。进一步可选地,该当前检验迭代对应的目标数据位宽可以是用户自定义输入的。可选地,在循环神经网络训练或微调过程中的待量化数据对应的数据位宽可以是不变的,即同一循环神经网络的同种待量化数据采用同一数据位宽进行量化,例如,针对该循环神经网络在各次迭代中的神经元数据均采用8比特的数据位宽进行量化。
可选地,循环神经网络训练或微调过程中的待量化数据对应的数据位宽为可变的,以保证数据位宽能够满足待量化数据的量化需求。也就是说,处理器可以根据待量化数据,自适应的调整该待量化数据对应的数据位宽,获得该待量化数据对应的目标数据位宽。具体地,处理器可以首先确定当前检验迭代对应的目标数据位宽,之后,处理器可以根据该当前检验迭代对应的目标数据位宽及该当前检验迭代对应的待量化数据,确定当前检验迭代对应的点位置。
图3-9示出本公开一实施例中数据位宽调整方法的流程图,如图3-9所示,上述操作S110可以包括:
S114、根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据是通过对所述当前检验迭代的待量化数据进行量化获得。
可选地,上述处理器可以采用初始数据位宽对待量化数据进行量化,获得上述的量化数据。该当前检验迭代的初始数据位宽可以是超参数,该当前检验迭代的初始数据位宽也可以是根据该当前检验迭代之前的上一检验迭代的待量化数据确定的。
具体地,处理器可以根据当前检验迭代的待量化数据和当前检验迭代的量化数据,确定中间表示数据。可选地,所述中间表示数据与上述的待量化数据的表示格式一致。例如,处理器可以对上述的量化数据进行反量化,获得与待量化数据的表示格式一致的中间表示数据,其中,反量化是指量化的逆过程。例如,该量化数据可以采用上述公式(2-3)获得,处理器还可以按照上述公式(2-4)对量化数据进行反量化,获得相应的中间表示数据,并根据待量化数据和中间表示数据确定量化误差。
进一步地,处理器可以根据待量化数据及其对应的中间表示数据计算获得量化误差。设当前检验迭代的待量化数据为F x=[z 1,z 2…,z m],该待量化数据对应的中间表示数据为F x1=[z 1 (n),z 2 (n)…,z m (n)]。处理器可以根据该待量化数据F x及其对应的中间表示数据F x1确定误差项,并根据该误差项确定量化误差。
可选地,处理器可以根据中间表示数据F x1中各元素的和,以及待量化数据F x中各元素的和确定上述的误差项,该误差项可以是中间表示数据F x1中各元素的和与待量化数据F x中各元素的和的差值。之后,处理器可以根据该误差项确定量化误差。具体的量化误差可以按照如下公式进行确定:
Figure PCTCN2020095679-appb-000038
其中,z i为待量化数据中的元素,z i (n)为中间表示数据F x1的元素。
可选地,处理器可以分别计算待量化数据中各元素与中间表示数据F x1中相应元素的差值,获得m个差值,并将该m个差值的和作为误差项。之后,处理器可以根据该误差项确定量化误差。具体的量 化误差可以按照如下公式确定:
Figure PCTCN2020095679-appb-000039
其中,z i为待量化数据中的元素,z i (n)为中间表示数据F x1的元素。
可选地,上述待量化数据中各元素与中间表示数据F x1中相应元素的差值可以约等于2 s-1,因此,上述量化误差还可以按照如下公式确定:
Figure PCTCN2020095679-appb-000040
其中,m为目标数据对应的中间表示数据F x1的数量,s为点位置,z i为待量化数据中的元素。
可选地,所述中间表示数据也可以与上述的量化数据的数据表示格式一致,并根据该中间表示数据和量化数据确定量化误差。例如,待量化数据可以表示为:F x≈I x×2 s,则可以确定出中间表示数据
Figure PCTCN2020095679-appb-000041
该中间表示数据I x1可以与上述的量化数据具有相同的数据表示格式。此时处理器可以根据中间表示数据I x1和上述公式(2-3)计算获得的
Figure PCTCN2020095679-appb-000042
确定量化误差。具体的量化误差确定方式可参照上述的公式(2-31)~公式(2-33)。
S115、根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
具体地,处理器可以根据该量化误差,自适应地调整当前检验迭代对应的数据位宽,确定该当前检验迭代调整后的目标数据位宽。当该量化误差满足预设条件时,则可以保持当前检验迭代对应的数据位宽不变,即该当前检验迭代的目标数据位宽可以等于初始数据位宽。当量化误差不满足预设条件时,处理器可以调整当前检验迭代的待量化数据对应的数据位宽,获得当前检验迭代对应的目标数据位宽。当处理器采用该目标数据位宽对当前检验迭代的待量化数据进行量化时,量化误差满足上述的预设条件。可选地,上述的预设条件可以是用户设置的预设阈值。
可选地,图3-10示出本公开另一实施例中数据位宽调整方法的流程图,如图3-10所示,上述操作S115可以包括:
S1150、处理器可以判断上述的量化误差是否大于或等于第一预设阈值。
若所述量化误差大于或等于第一预设阈值,则可以执行操作S1151,增大所述当前检验迭代对应的数据位宽,获得当前检验迭代的目标数据位宽。当量化误差小于第一预设阈值时,则可以保持当前检验迭代的数据位宽不变。
进一步可选地,处理器可以经过一次调整获得上述的目标数据位宽。例如,当前检验迭代的初始数据位宽为n1,处理器可以经一次调整确定该目标数据位宽n2=n1+t,其中,t为数据位宽的调整值。其中,采用该目标数据位宽n2对当前检验迭代的待量化数据进行量化时,获得的量化误差可以小于所述第一预设阈值。
进一步可选地,处理器可以经过多次调整获得目标数据位宽,直至量化误差小于第一预设阈值,并将该量化误差小于第一预设阈值时的数据位宽作为目标数据位宽。具体地,若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;之后处理器可以根据该第一中间数据位宽对当前检验迭代的待量化数据进行量化,获得量化数据,并根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值。处理器可以将该量化误差小于第一预设阈值时对应的数据位宽,作为该目标数据位宽。
例如,当前检验迭代的初始数据位宽为n1,处理器可以采用该初始数据位宽n1对当前检验迭代的待量化数据A进行量化,获得量化数据B1,并根据该待量化数据A和量化数据B1计算获得量化误差C1。在量化误差C1大于或等于第一预设阈值时,处理器确定第一中间数据位宽n2=n1+t1,其中,t1为第一预设位宽步长。之后,处理器可以根据该第一中间数据位宽n2对当前检验迭代的待量化数据进行量化,获得当前检验迭代的量化数据B2,并根据该待量化数据A和量化数据B2计算获得量化误差C2。若该量化误差C2大于或等于第一预设阈值时,处理器确定第一中间数据位宽n2=n1+t1+t1,之后根据该新的第一中间数据位宽对当前检验迭代的待量化数据A进行量化,并计算相应的量化误差,直至量化误差小于第一预设阈值。若量化误差C1小于第一预设阈值,则可以保持该初始数据位宽n1不变。
更进一步地,上述的第一预设位宽步长可以是恒定值,例如,每当量化误差大于第一预设阈值时,则处理器可以将当前检验迭代对应的数据位宽增大相同的位宽值。可选地,上述的第一预设位宽步长也可以是可变值,例如,处理器可以计算量化误差与第一预设阈值的差值,若该量化误差与第一预设阈值的差值越小,则第一预设位宽步长的值越小。
可选地,图3-11示出本公开又一实施例中数据位宽调整方法的流程图,如图3-11所示,上述操作S115还可以包括:
S1152、处理器可以判断上述的量化误差是否小于或等于第一预设阈值。
若所述量化误差小于或等于第二预设阈值,则可以执行操作S1153,减小所述当前检验迭代对应的数据位宽,获得当前检验迭代的目标数据位宽。当量化误差大于第二预设阈值时,则可以保持当前检验迭代的数据位宽不变。
进一步可选地,处理器可以经过一次调整获得上述的目标数据位宽。例如,当前检验迭代的初始数据位宽为n1,处理器可以经一次调整确定该目标数据位宽n2=n1-t,其中,t为数据位宽的调整值。其中,采用该目标数据位宽n2对当前检验迭代的待量化数据进行量化时,获得的量化误差可以大于所述第二预设阈值。
进一步可选地,处理器可以经过多次调整获得目标数据位宽,直至量化误差大于第二预设阈值,并将该量化误差大于第二预设阈值时的数据位宽作为目标数据位宽。具体地,若所述量化误差小于或等于第一预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;之后处理器可以根据该第二中间数据位宽对当前检验迭代的待量化数据进行量化,获得量化数据,并根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值。处理器可以将该量化误差大于第二预设阈值时对应的数据位宽,作为该目标数据位宽。
例如,当前检验迭代的初始数据位宽为n1,处理器可以采用该初始数据位宽n1对当前检验迭代的待量化数据A进行量化,获得量化数据B1,并根据该待量化数据A和量化数据B1计算获得量化误差C1。在量化误差C1小于或等于第二预设阈值时,处理器确定第二中间数据位宽n2=n1-t2,其中,t2为第二预设位宽步长。之后,处理器可以根据该第二中间数据位宽n2对当前检验迭代的待量化数据进行量化,获得当前检验迭代的量化数据B2,并根据该待量化数据A和量化数据B2计算获得量化误差C2。若该量化误差C2小于或等于第二预设阈值时,处理器确定第二中间数据位宽n2=n1-t2-t2,之后根据该新的第二中间数据位宽对当前检验迭代的待量化数据A进行量化,并计算相应的量化误差,直至量化误差大于第二预设阈值。若量化误差C1大于第二预设阈值,则可以保持该初始数据位宽n1不变。
更进一步地,上述的第二预设位宽步长可以是恒定值,例如,每当量化误差小于第二预设阈值时,则处理器可以将当前检验迭代对应的数据位宽减小相同的位宽值。可选地,上述的第二预设位宽步长也可以是可变值,例如,处理器可以计算量化误差与第二预设阈值的差值,若该量化误差与第二预设阈值的差值越小,则第二预设位宽步长的值越小。
可选地,图3-12示出本公开再一实施例中数据位宽调整方法的流程图,如图3-12所示,当处理器确定量化误差小于第一预设阈值,且量化误差大于第二预设阈值时,可以保持当前检验迭代的数据位宽不变,其中,第一预设阈值大于第二预设阈值。即当前检验迭代的目标数据位宽可以等于初始数据位宽。其中,图3-12中仅以举例的方式说明本公开一实施例的数据位宽确定方式,图3-12中各个操作的顺序可以适应性调整,此处并不作具体限定。
本公开实施例中,由于当前检验迭代的数据位宽发生变化时,会相应的带来点位置的变化。但此时点位置的变化并非是待量化数据的数据变动引起的,根据上述公式(2-30)确定的第一误差计算获得的目标迭代间隔可能不准确,从而会影响量化的精度。因此,在当前检验迭代的数据位宽发生变化时,可以相应的调整上述的第二均值,以保证第一误差能够准确的反映点位置的变动幅度,进而保证目标迭代间隔的准确性和可靠性。具体地,图3-13示出本公开另一实施例中第二均值的确定方法的流程图,如图3-13所示,上述方法还可以包括:
S116、根据所述目标数据位宽,确定所述当前检验迭代的数据位宽调整值;
具体地,处理器可以根据当前检验迭代的目标数据位宽和初始数据位宽,确定当前检验迭代的数据位宽调整值。其中,该数据位宽调整值=目标数据位宽-初始数据位宽。当然,处理器还可以直接获得当前检验迭代的数据位宽调整值。
S117、根据所述当前检验迭代的数据位宽调整值,更新上述的第二均值。
具体地,若数据位宽调整值大于预设参数(例如,该预设参数可以等于零)时,即当前检验迭代的数据位宽增加时,处理器可以相应地减少第二均值。若数据位宽调整值小于预设参数(例如,该预设参数可以等于零)时,即当前检验迭代的数据位宽减少时,处理器可以相应地增加第二均值。若数据位宽调整值等于预设参数,即当数据位宽调整值等于0时,此时当前检验迭代对应的待量化数据未发生改变,更新后的第二均值等于更新前的第二均值,该更新前的第二均值根据上述公式(2-29)计算获得。可选地,若数据位宽调整值等于预设参数,即当数据位宽调整值等于0时,处理器可以不更新第二均值,即处理器可以不执行上述操作S117。
例如,更新前的第二均值M2=β×s t+(1-β)×M1;当前检验迭代对应的目标数据位宽n2=初始数据位宽n1+Δn时,其中,Δn表示数据位宽调整值。此时,更新后的第二均值M2=β×(s t-Δn)+(1-β)×(M1-Δn)。当前检验迭代对应的目标数据位宽n2=初始数据位宽n1-Δn时,其中,Δn表示数据位宽调整值,此时,更新后的第二均值M2=β×(s t-Δn)+(1-β)×(M1+Δn),其中,s t是指当前检验迭代是根据目标数据位宽确定的点位置。
再如,更新前的第二均值M2=β×s t+(1-β)×M1;当前检验迭代对应的目标数据位宽n2=初始数据位宽n1+Δn时,其中,Δn表示数据位宽调整值。此时,更新后的第二均值M2=β×s t+(1-β)×M1-Δn。再如,当前检验迭代对应的目标数据位宽n2=初始数据位宽n1-Δn时,其中,Δn表示数据位宽调整值,此时,更新后的第二均值M2=β×s t+(1-β)×M1+Δn,其中,s t是指当前检验迭代是根据目标数据位宽确定的点位置。
进一步地,如图3-6所示,上述操作S200可以包括:
S210、根据点位置的变动幅度,确定第一目标迭代间隔,其中,该第一目标迭代间隔与上述的点位置的变动幅度负相关。即上述的点位置的变动幅度越大,则该第一目标迭代间隔越小。上述的点位置的变动幅度越小,则该第一目标迭代间隔越大。
如上所述,上述的第一误差可以表征点位置的变动幅度,因此,如图3-7所示,上述操作S210可以包括:
S211、处理器可以根据所述第一误差确定所述第一目标迭代间隔,其中,第一目标迭代间隔与所述第一误差负相关。即第一误差越大,则说明点位置的变化幅度越大,进而表明待量化数据的数据变动幅度越大,此时,第一目标迭代间隔越小。
具体地,处理器可以根据以下公式计算得到第一目标迭代间隔I:
Figure PCTCN2020095679-appb-000043
其中,I为第一目标迭代间隔,diff update1表示上述的第一误差,δ和γ可以为超参数。
可以理解的是,第一误差可以用于衡量点位置的变动幅度,第一误差越大,表明点位置的变动幅度越大,进而说明待量化数据的数据变动幅度越大,第一目标迭代间隔需要设置得越小。也就是说,第一误差越大,量化参数的调整越频繁。
在本实施例中,通过计算点位置的变动幅度(第一误差),并根据点位置的变动幅度确定第一目标迭代间隔。由于量化参数根据第一目标迭代间隔确定,也就使得根据量化参数进行量化得到的量化数据,能够更加符合目标数据的点位置的变动趋势,在保证量化精度的同时,提高循环神经网络的运行效率。
可选地,处理器在当前检验迭代处确定第一目标迭代间隔后,可以进一步在当前检验迭代处确定 第一目标迭代间隔对应的量化参数和数据位宽等参数,从而根据第一目标迭代间隔更新量化参数。其中,量化参数可以包括点位置和/或缩放系数。进一步地,该量化参数还可以包括偏移量。该量化参数的具体计算方式可参见上文中的描述。图3-14示出本公开另一实施例的量化参数调整方法的流程图,如图3-14所示,上述方法还可以包括:
S300、处理器根据第一目标迭代间隔调整循环神经网络运算中的量化参数。
具体地,处理器可以根据第一目标迭代间隔以及每个周期中迭代的总数确定更新迭代(亦可称检验迭代),并在各个更新迭代处更新第一目标迭代间隔,还可以在各个更新迭代处更新量化参数。例如,循环神经网络运算中的数据位宽保持不变,此时,处理器可以在各个更新迭代处直接根据更新迭代的待量化数据,调整点位置等量化参数。再如,循环神经网络运算中的数据位宽可变,此时,处理器可以在各个更新迭代处更新数据位宽,并根据更新后的数据位宽和该更新迭代的待量化数据,调整点位置等量化参数。
本公开实施例中,处理器在各个检验迭代处更新量化参数,以保证当前量化参数满足待量化数据的量化需求。其中,更新前的第一目标迭代间隔与更新后的第一目标迭代间隔可以相同,也可以不同。更新前的数据位宽与更新后的数据位宽可以相同,也可以不同;即不同迭代间隔的数据位宽可以相同,也可以不同。更新前的量化参数与更新后的量化参数可以相同,也可以不同;即不同迭代间隔的量化参数可以相同,也可以不同。
可选地,上述操作S300中,处理器可以在更新迭代处确定第一目标迭代间隔中的量化参数,以调整循环神经网络运算中的量化参数。
在一种可能的实现方式中,当该方法用于循环神经网络的训练或微调过程中时,操作S200可以包括:
处理器确定当前检验迭代是否大于第一预设迭代,其中,在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
其中,当前检验迭代是指处理器当前执行的迭代运算。可选地,该第一预设迭代可以是超参数,该第一预设迭代可以是根据待量化数据的数据变动曲线确定的,该第一预设迭代也可以是用户自定义设置的。可选地,该第一预设迭代可以小于一个周期(epoch)包含的迭代总数,其中,一个周期是指数据集中的所有待量化数据均完成一次正向运算和一次反向运算。
可选地,处理器可以读取用户输入的第一预设迭代,并根据该第一预设迭代与预设迭代间隔的对应关系,确定预设迭代间隔。可选地,该预设迭代间隔可以是超参数,该预设迭代间隔也可以是用户自定义设置的。此时,处理器可以直接读取用户输入的第一预设迭代和预设迭代间隔,并根据该预设迭代间隔更新循环神经网络运算中的量化参数。本公开实施例中,处理器无需根据待量化数据的数据变动幅度,确定目标迭代间隔。
例如,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为5,则在当前检验迭代小于或等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每间隔5次迭代更新一次量化参数。具体地,处理器可以确定第1次迭代对应的数据位宽n1及点位置s1等量化参数,并采用该数据位宽n1和点位置s1等量化参数对第1次迭代至第5次迭代的待量化数据进行量化,即第1次迭代至第5次迭代可以采用相同的量化参数。之后,处理器可以确定第6次迭代对应的数据位宽n2及点位置s2等量化参数,并采用该数据位宽n2和点位置s2等量化参数对第6次迭代至第10次迭代的待量化数据进行量化,即第6次迭代至第10次迭代可以采用相同的量化参数。同理,处理器可以按照上述量化方式直至完成第100次迭代。其中,每个迭代间隔中数据位宽及点位置等量化参数的确定方式可以参见上文的描述,此处不再赘述。
再如,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为1,则在当前检验迭代小于或 等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每次迭代均更新量化参数。具体地,处理器可以确定第1次迭代对应的数据位宽n1及点位置s1等量化参数,并采用该数据位宽n1和点位置s1等量化参数对第1次迭代的待量化数据进行量化。之后,处理器可以确定第2次迭代对应的数据位宽n2及点位置s2等量化参数,并采用该数据位宽n2和点位置s2等量化参数对第2次迭代的待量化数据进行量化,……。同理,处理器可以确定出第100次迭代的数据位宽n100以及点位置s100等量化参数,并采用该数据位宽n100和点位置s100等量化参数对第100次迭代的待量化数据进行量化。其中,每个迭代间隔中数据位宽及点位置等量化参数的确定方式可以参见上文的描述,此处不再赘述。
上文仅以数据位宽和量化参数同步更新的方式举例说明,在其他可选的实施例中,在每个目标迭代间隔中,处理器还可以根据点位置的变动幅度确定点位置的迭代间隔,并根据该点位置迭代间隔更新点位置等量化参数。
可选地,在当前检验迭代大于第一预设迭代时,可以表明循环神经网络的训练或微调处于中期阶段,此时可以获得历史迭代的待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定第一目标迭代间隔,该第一目标迭代间隔可以大于上述的预设迭代间隔,从而可以减少量化参数的更新次数,提高量化效率及运算效率。具体地,在所述当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
承接上例,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为1,则在当前检验迭代小于或等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每次迭代均更新量化参数,具体实现方式可以参见上文中的描述。在当前检验迭代大于第100次迭代时,处理器可以根据当前检验迭代的待量化数据及其之前的历史迭代的待量化数据,确定待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定第一目标迭代间隔。具体地,在当前检验迭代大于第100次迭代时,处理器可以自适应地调整当前检验迭代对应的数据位宽,获得该当前检验迭代对应的目标数据位宽,并将该当前检验迭代对应的目标数据位宽作为第一目标迭代间隔的数据位宽,其中,第一目标迭代间隔中迭代对应的数据位宽一致。同时,处理器可以根据当前检验迭代对应的目标数据位宽和待量化数据,确定当前检验迭代对应的点位置,并根据当前检验迭代对应的点位置确定第一误差。处理器还可以根据当前检验迭代对应的待量化数据,确定量化误差,并根据量化误差确定第二误差。之后,处理器可以根据第一误差和第二误差确定第一目标迭代间隔,该第一目标迭代间隔可以大于上述的预设迭代间隔。进一步地,处理器可以确定第一目标迭代间隔中的点位置或缩放系数等量化参数,具体确定方式可参见上文中的描述。
例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第一目标迭代间隔的迭代间隔为3,则处理器可以确定该第一目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。处理器还可以根据第100次迭代的待量化数据确定量化误差,并根据量化误差确定第二误差和第100次迭代对应的目标数据位宽,将该目标数据位宽作为第一目标迭代间隔对应的数据位宽,其中,第100次迭代、第101次迭代和第102次迭代对应的数据位宽均为该第100次迭代对应的目标数据位宽。处理器还可以根据该第100次迭代的待量化数据和该第100次迭代对应的目标数据位宽确定该第100次迭代对应的点位置和缩放系数等量化参数。之后,采用该第100次迭代对应的量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。
在一种可能的实现方式中,操作S200还可以包括:
在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期, 所述多个周期中迭代的总数不一致。
在当前检验迭代大于第一预设迭代时,处理器可以进一步确定当前检验迭代是否大于第二预设迭代。其中,所述第二预设迭代大于所述第一预设迭代,所述第二预设迭代间隔大于所述预设迭代间隔。可选地,上述第二预设迭代可以是超参数,第二预设迭代可以大于至少一个周期的迭代总数。可选地,第二预设迭代可以根据待量化数据的数据变动曲线确定。可选地,第二预设迭代也可以是用户自定义设置的。
在一种可能的实现方式中,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔,包括:
根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
举例来说,如图3-5c所示,假定在第一目标迭代周期I=1。在第一周期iter 1的t 1迭代中确定需要更新量化参数,则第一周期iter 1的t 2迭代所对应的下一个更新迭代可以为第二周期iter 2中的t 1迭代。在第一周期iter 1的t 2迭代中确定需要更新量化参数,由于第一周期iter 1的t 2迭代的迭代排序数3大于第二周期的迭代总数,则第一周期iter 1的t 2迭代所对应的下一个更新迭代会变为第三周期iter 3中的t 2迭代。在第一周期iter 1的t 3迭代中确定需要更新量化参数,由于第一周期iter 1的t 2迭代的迭代排序数4大于第二周期、第三周期的迭代总数,则第一周期iter 1的t 3迭代所对应的下一个更新迭代会变为第四周期iter 4中的t 3迭代。
这样,处理器可以根据预设迭代间隔和第二目标迭代间隔对量化参数和第一目标迭代间隔进行更新,为便于描述本文中对实际进行量化参数和第一目标迭代间隔更新的预设迭代间隔和第二目标迭代间隔称为参考迭代间隔或目标迭代间隔。
在一种情况下,该循环神经网络运算中的各次迭代对应的数据位宽均不发生变化,即该循环神经网络运算中各次迭代对应的数据位宽均相同,此时,处理器可以通过确定参考迭代间隔中的点位置等量化参数,实现根据参考迭代间隔对循环神经网络运算中的量化参数的调整的目的。其中,该参考迭代间隔中迭代对应的量化参数可以是一致的。也就是说,参考迭代间隔中的各次迭代均采用同一点位置,仅在各次检验迭代处更新确定点位置等量化参数,从而可以避免每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,针对上述数据位宽不变的情况,参考迭代间隔中迭代对应的点位置可以保持一致。具体地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点位置,并将该当前检验迭代对应的点位置作为该参考迭代间隔对应的点位置,该参考迭代间隔中迭代均沿用当前检验迭代对应的点位置。可选地,该当前检验迭代对应的目标数据位宽可以是超参数。例如,该当前检验迭代对应的目标数据位宽是由用户自定义输入的。该当前检验迭代对应的点位置可以参见上文的公式(2-2)或公式(2-14)计算。
在一种情况下,该循环神经网络运算中的各次迭代对应的数据位宽可以发生变化,即不同参考迭代间隔对应的数据位宽可以不一致,但参考迭代间隔中各次迭代的数据位宽保持不变。其中,该参考迭代间隔中迭代对应的数据位宽可以是超参数,例如,该参考迭代间隔中迭代对应的数据位宽可以是用户自定义输入的。在一种情况下,该参考迭代间隔中迭代对应的数据位宽也可以是处理器计算获得的,例如,处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的目标数据位宽,并将该当前检验迭代对应的目标数据位宽作为参考迭代间隔对应的数据位宽。
此时,为简化量化过程中的计算量,该参考迭代间隔中对应的点位置等量化参数也可以保持不变。也就是说,参考迭代间隔中的各次迭代均采用同一点位置,仅在各次检验迭代处更新确定点位置等量化参数以及数据位宽,从而可以避免每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,针对上述参考迭代间隔对应的数据位宽不变的情况,参考迭代间隔中迭代对应的点位置可以保持一致。具体地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点位置,并将该当前检验迭代对应的点位置作为该参考迭代间隔对应的点位置,该参考迭代间隔中迭代均沿用当前检验迭代对应的点位置。可选地,该当前检验迭代对应的目标数据位宽可以是超参数。例如,该当前检验迭代对应的目标数据位宽是由用户自定义输入的。该当前检验迭代对应的点位置可以参见上文的公式(2-2)或公式(2-14)计算。
可选地,参考迭代间隔中迭代对应的缩放系数可以一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的缩放系数,并将该当前检验迭代对应的缩放系数作为参考迭代间隔中各次迭代的缩放系数。其中,该参考迭代间隔中迭代对应的缩放系数一致。
可选地,参考迭代间隔中迭代对应的偏移量一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的偏移量,并将该当前检验迭代对应的偏移量作为参考迭代间隔中各次迭代的偏移量。进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。该参考迭代间隔中迭代对应的偏移量一致。
例如,该参考迭代间隔可以从当前检验迭代开始计算迭代数量,即参考迭代间隔对应的检验迭代可以是参考迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为3,则处理器可以确定该参考迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。这样,处理器在第101次迭代和第102次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
可选地,参考迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该参考迭代间隔对应的检验迭代也可以是该参考迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为3。则处理器可以确定该参考迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第101次迭代、第102次迭代和第103次迭代进行量化。这样,处理器在第102次迭代和第103次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
本公开实施例中,同一参考迭代间隔中各次迭代对应的数据位宽及量化参数均一致,即同一参考迭代间隔中各次迭代对应的数据位宽、点位置、缩放系数及偏移量均保持不变,从而在循环神经网络的训练或微调过程中,可以避免频繁地调整待量化数据的量化参数,减少了量化过程中的计算量,从而可以提高量化效率。并且,通过在训练或微调的不同阶段根据数据变动幅度,动态地调整量化参数,可以保证量化精度。
在另一情况下,该循环神经网络运算中的各次迭代对应的数据位宽可以发生变化,但参考迭代间隔中各次迭代的数据位宽保持不变。此时,参考迭代间隔中迭代对应的点位置等量化参数也可以不一致。处理器还可以根据当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,参考迭代间隔中迭代对应的数据位宽一致。之后,处理器可以根据该参考迭代间隔对应的数据位宽和点位置迭代间隔,调整循环神经网络运算过程中的点位置等量化参数。可选地,图3-15示出本公开一实施例的量化参数调整方法中调整量化参数的流程图,如图3-15所示,上述操作S300还可以包括:
S310、根据当前检验迭代的待量化数据,确定参考迭代间隔对应的数据位宽;其中,该参考迭代间隔中迭代对应的数据位宽一致。也就是说,循环神经网络运算过程中的数据位宽每隔一个参考迭代间隔更新一次。可选地,该参考迭代间隔对应的数据位宽可以为当前检验迭代的目标数据位宽。该当前检验迭代的目标数据位宽可参见上文中的操作S114和S115,此处不再赘述。
例如,该参考迭代间隔可以从当前检验迭代开始计算迭代数量,即参考迭代间隔对应的检验迭代可以是参考迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数 据变动幅度,确定参考迭代间隔的迭代间隔为6,则处理器可以确定该参考迭代间隔包括6次迭代,分别为第100次迭代至第105次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至第105次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至第105次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定参考迭代间隔,以及更新数据位宽的操作。
可选地,参考迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该参考迭代间隔对应的检验迭代也可以是该参考迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为6。则处理器可以确定该参考迭代间隔包括6次迭代,分别为第101次迭代至第106次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至106次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至106次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定参考迭代间隔,以及更新数据位宽的操作。
S320、处理器根据获取的点位置迭代间隔和所述参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置等量化参数。
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。可选地,该点位置迭代间隔可以是超参数,例如,该点位置迭代间隔可以是用户自定义输入的。
可选地,所述点位置迭代间隔小于或等于所述参考迭代间隔。当该点位置迭代间隔与上述的参考迭代间隔相同时,处理器可以在当前检验迭代处同步更新数据位宽和点位置等量化参数。进一步可选地,参考迭代间隔中迭代对应的缩放系数可以一致。更进一步地,参考迭代间隔中迭代对应的偏移量一致。此时,该参考迭代间隔中的迭代对应的数据位宽和点位置等量化参数均相同,从而可以降低计算量,提高量化效率和运算效率。具体实现过程与上述实施例基本一致,可参见上文的描述,此处不再赘述。
当点位置迭代间隔小于上述的参考迭代间隔时,处理器可以在参考迭代间隔对应的检验迭代处更新数据位宽和点位置等量化参数,并在该点位置迭代间隔确定的子检验迭代处更新点位置等量化参数。由于在数据位宽不变的情况下,点位置等量化参数可以根据待量化数据进行微调,因此,在同一参考迭代间隔内也可以对点位置等量化参数进行调整,以进一步提高量化精度。
具体地,处理器可以根据当前检验迭代和点位置迭代间隔确定子检验迭代,该子检验迭代用于调整点位置,该子检验迭代可以是参考迭代间隔中的迭代。进一步地,处理器可以根据子检验迭代的待量化数据和参考迭代间隔对应的数据位宽,调整参考迭代间隔中迭代对应的点位置,其中,点位置的确定方式可以参照上述的公式(2-2)或公式(2-14),此处不再赘述。
例如,当前检验迭代为第100次迭代,该参考迭代间隔为6,该参考迭代间隔包含的迭代为第100次迭代至第105次迭代。处理器获取的点位置迭代间隔为I s1=3,则可以从当前检验迭代开始间隔三次迭代调整一次点位置。具体地,处理器可以将第100次迭代作为上述的子检验迭代,并计算获得该第100次迭代对应的点位置s1,在第100次迭代、第101次迭代和第102次迭代共用点位置s1进行量化。之后,处理器可以根据点位置迭代间隔I s1将第103次迭代作为上述的子检验迭代,同时处理器还可以根据第103次迭代对应的待量化数据和参考迭代间隔对应的数据位宽n确定第二个点位置迭代间隔对应的点位置s2,则在第103次迭代至第105次迭代中可以共用上述的点位置s2进行量化。本公开实施例中,上述的更新前的点位置s1与更新后的点位置s2的值可以相同,也可以不同。进一步地,处理器可以在第106次迭代重新根据待量化数据的数据变动幅度,确定下一参考迭代间隔以及该下一参考迭代间隔对应的数据位宽及点位置等量化参数。
再如,当前检验迭代为第100次迭代,该参考迭代间隔为6,该参考迭代间隔包含的迭代为第101次迭代至第106次迭代。处理器获取的点位置迭代间隔为I s1=3,则可以从当前检验迭代开始间隔三次迭代调整一次点位置。具体地,处理器可以根据当前检验迭代的待量化数据和当前检验迭代对应的目标数据位宽n1,确定第一个点位置迭代间隔对应的点位置为s1,则在第101次迭代、第102次迭代和103次迭代共用上述的点位置s1进行量化。之后,处理器可以根据点位置迭代间隔I s1将第104次迭代作为 上述的子检验迭代,同时处理器还可以根据第104次迭代对应的待量化数据和参考迭代间隔对应的数据位宽n1确定第二个点位置迭代间隔对应的点位置s2,则在第104次迭代至第106次迭代中可以共用上述的点位置s2进行量化。本公开实施例中,上述的更新前的点位置s1与更新后的点位置s2的值可以相同,也可以不同。进一步地,处理器可以在106次迭代重新根据待量化数据的数据变动幅度,确定下一参考迭代间隔以及该下一参考迭代间隔对应的数据位宽及点位置等量化参数。
可选地,该点位置迭代间隔可以等于1,即每次迭代均更新一次点位置。可选地,该点位置迭代间隔可以相同,也可以不同。例如,该参考迭代间隔包含的至少一个点位置迭代间隔可以是依次增大的。此处仅以举例的说明本实施例的实现方式,并不用于限定本公开。
可选地,该参考迭代间隔中迭代对应的缩放系数也可以不一致。进一步可选地,该缩放系数可以与上述的点位置同步更新,也就是说,该缩放系数对应的迭代间隔可以等于上述的点位置迭代间隔。即每当处理器更新确定点位置时,会相应地更新确定缩放系数。
可选地,该参考迭代间隔中迭代对应的偏移量也可以不一致。进一步地,该偏移量可以与上述的点位置同步更新,也就是说,该偏移量对应的迭代间隔可以等于上述的点位置迭代间隔。即每当处理器更新确定点位置时,会相应地更新确定偏移量。当然,该偏移量也可以与上述地点位置或数据位宽异步更新,此处不作具体限定。更进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。
在另一种实施例中,处理器可以根据点位置的变动幅度和待量化数据的数据位宽的变化,综合确定待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定参考迭代间隔,其中,该参考迭代间隔可以用于更新确定数据位宽,即处理器可以在每个参考迭代间隔的检验迭代处更新确定数据位宽。由于点位置可以反映定点数据的精度,数据位宽可以反映定点数据的数据表示范围,因而通过综合点位置的变动幅度和待量化数据的数据位宽变化,可以保证量化后的数据既能够兼顾精度,也能够满足数据表示范围。可选地,点位置的变化幅度可以采用上述的第一误差进行表征,数据位宽的变化可以根据上述的量化误差进行确定。具体地,图3-16示出本公开另一实施例的参数调整方法中第一目标迭代间隔的确定方法的流程图,如图3-16所示,上述方法可以包括:
S400、获取第一误差,第一误差能够表征点位置的变动幅度,该点位置的变动幅度可以表示待量化数据的数据变动幅度;具体地,上述第一误差的计算方式可参见上文中操作S110中的描述,此处不再赘述。
S500、获取第二误差,所述第二误差用于表征所述数据位宽的变化。
可选地,上述的第二误差可以根据量化误差进行确定,该第二误差与上述的量化误差正相关。在一种可能的实现方式中,上述操作S500可以包括:
根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差;
根据所述量化误差确定所述第二误差,所述第二误差与所述量化误差正相关。
其中,所述当前检验迭代的量化数据根据初始数据位宽对所述当前检验迭代的待量化数据进行量化获得。其中,具体的量化误差确定方式可参见上文中操作S114中的描述,此处不再赘述。
具体地,第二误差可以按照如下公式进行计算:
diff update2=θ*diff bit 2  公式(2-34)
其中,diff update2表示上述的第二误差,diff bit表示上述的量化误差,θ可以为超参数。
S600、根据所述第二误差和所述第一误差,确定所述第一目标迭代间隔。
具体地,处理器可以根据第一误差和第二误差计算获得目标误差,并根据目标误差确定目标迭代间隔。可选地,目标误差可以是第一误差和第二误差进行加权平均计算获得。例如,目标误差=K*第一误差+(1-K)*第二误差,其中,K为超参数。之后,处理器可以根据该目标误差确定目标迭代间隔,目标迭代间隔与该目标误差负相关。即目标误差越大,目标迭代间隔越小。
可选地,该目标误差还可以根据第一误差和第二误差中的最值进行确定,此时第一误差或第二误差的权重取值为0。在一种可能的实现方式中,上述操作S600可以包括:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述第一目标迭代间隔,其中,所述目标误差与所述第一目标迭代间隔负相关。
具体地,处理器可以比较第一误差diff update1和第二误差diff update2的大小,当第一误差diff update1大于第二误差diff update2时,则该目标误差等于第一误差diff update1。当第一误差diff update1小于第二误差时,则该目标误差等于第二误差diff update2。当第一误差diff update1等于第二误差时,则该目标误差可以时第一误差diff update1或第二误差diff update2。即目标误差diff update可以按照如下公式进行确定:
diff update=max(diff update1,diff update2)   公式(2-35)
其中,diff update是指目标误差,diff update1是指第一误差,diff update2是指第二误差。
具体地,第一目标迭代间隔可以按照如下方式进行确定,
可以根据以下公式计算得到第一目标迭代间隔:
Figure PCTCN2020095679-appb-000044
其中,I表示目标迭代间隔,diff update表示上述的目标误差,δ和γ可以为超参数。
可选地,上述实施例中,循环神经网络运算中数据位宽可变,并可以通过第二误差衡量数据位宽的变化趋势。此种情况下,处理器在确定第一目标迭代间隔后,可以确定第二目标迭代间隔以及确定第二目标迭代间隔中迭代对应的数据位宽,其中,该第二目标迭代间隔中迭代对应的数据位宽一致。具体地,处理器可以根据当前检验迭代的待量化数据,确定第二目标迭代间隔对应的数据位宽。也就是说,循环神经网络运算过程中的数据位宽每隔一个第二目标迭代间隔更新一次。可选地,该第二目标迭代间隔对应的数据位宽可以为当前检验迭代的目标数据位宽。该当前检验迭代的目标数据位宽可参见上文中的操作S114和S115,此处不再赘述。
例如,该第二目标迭代间隔可以从当前检验迭代开始计算迭代数量,即第二目标迭代间隔对应的检验迭代可以是第二目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为6,则处理器可以确定该第二目标迭代间隔包括6次迭代,分别为第100次迭代至第105次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至第105次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至第105次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定第二目标迭代间隔,以及更新数据位宽的操作。
可选地,第二目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该第二目标迭代间隔对应的检验迭代也可以是该第二目标迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为6。则处理器可以确定该第二目标迭代间隔包括6次迭代,分别为第101次迭代至第106次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至106次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至106次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定目标迭代间隔,以及更新数据位宽的操作。
再进一步地,处理器还可以在检验迭代处确定第二目标迭代间隔中的量化参数,以根据第二目标迭代间隔调整循环神经网络运算中的量化参数。即该循环神经网络运算中的点位置等量化参数可以与数据位宽同步更新。
在一种情况下,该第二目标迭代间隔中迭代对应的量化参数可以是一致的。可选地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点位置,并将该当前检验迭代对应的点位置作为该第二目标迭代间隔对应的点位置,其中该第二目标迭代间隔中迭代对应的点位置一致。也就是说,第二目标迭代间隔中的各次迭代均沿用当前检验迭代的点位置等量化参数,避免了每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,第二目标迭代间隔中迭代对应的缩放系数可以一致。处理器可以根据当前检验迭代的待 量化数据,确定当前检验迭代对应的缩放系数,并将该当前检验迭代对应的缩放系数作为第二目标迭代间隔中各次迭代的缩放系数。其中,该第二目标迭代间隔中迭代对应的缩放系数一致。
可选地,第二目标迭代间隔中迭代对应的偏移量一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的偏移量,并将该当前检验迭代对应的偏移量作为第二目标迭代间隔中各次迭代的偏移量。进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。该第二目标迭代间隔中迭代对应的偏移量一致。
例如,该第二目标迭代间隔可以从当前检验迭代开始计算迭代数量,即第二目标迭代间隔对应的检验迭代可以是第二目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为3,则处理器可以确定该第二目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。这样,处理器在第101次迭代和第102次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
可选地,第二目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该第二目标迭代间隔对应的检验迭代也可以是该第二目标迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为3。则处理器可以确定该第二目标迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第101次迭代、第102次迭代和第103次迭代进行量化。这样,处理器在第102次迭代和第103次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
本公开实施例中,同一第二目标迭代间隔中各次迭代对应的数据位宽及量化参数均一致,即同一第二目标迭代间隔中各次迭代对应的数据位宽、点位置、缩放系数及偏移量均保持不变,从而在循环神经网络的训练或微调过程中,可以避免频繁地调整待量化数据的量化参数,减少了量化过程中的计算量,从而可以提高量化效率。并且,通过在训练或微调的不同阶段根据数据变动幅度,动态地调整量化参数,可以保证量化精度。
在另一种情况下,处理器还可以根据点位置等量化参数对应的点位置迭代间隔确定第二目标迭代间隔中的量化参数,以根据调整循环神经网络运算中的量化参数。即该循环神经网络运算中的点位置等量化参数可以与数据位宽异步更新,处理器可以在第二目标迭代间隔的检验迭代处更新数据位宽和点位置等量化参数,处理器还可以根据点位置迭代间隔单独更新第二目标迭代间隔中迭代对应的点位置。
具体地,处理器还可以根据当前检验迭代对应的目标数据位宽,确定第二目标迭代间隔对应的数据位宽,其中,第二目标迭代间隔中迭代对应的数据位宽一致。之后,处理器可以根据该第二目标迭代间隔对应的数据位宽和点位置迭代间隔,调整循环神经网络运算过程中的点位置等量化参数。在确定第二目标迭代间隔对应的数据位宽之后,根据获取的点位置迭代间隔和所述第二目标迭代间隔对应的数据位宽,调整所述第二目标迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置。其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。可选地,该点位置迭代间隔可以是超参数,例如,该点位置迭代间隔可以是用户自定义输入的。
在一种可选的实施例中,上述的方法可以用于循环神经网络的训练或微调过程中,以实现对循环神经网络微调或训练过程涉及的运算数据的量化参数进行调整,以提高循环神经网络运算过程中涉及的运算数据的量化精度及效率。该运算数据可以是神经元数据、权值数据或梯度数据中的至少一种。如图3-5a所示,根据待量化数据的数据变动曲线可知,在训练或微调的初期阶段,各次迭代的待量化数据之间的差异性较大,待量化数据的数据变动幅度较为剧烈,此时可以目标迭代间隔的值可以较 小,以及时地更新目标迭代间隔中的量化参数,保证量化精度。在训练或微调的中期阶段,待量化数据的数据变动幅度逐渐趋于平缓,此时可以增大目标迭代间隔的值,以避免频繁地更新量化参数,以提高量化效率及运算效率。在训练或微调的后期阶段,此时循环神经网络的训练或微调趋于稳定(即当循环神经网络的正向运算结果趋近于预设参考值时,该循环神经网络的训练或微调趋于稳定),此时可以继续增大目标迭代间隔的值,以进一步提高量化效率及运算效率。基于上述数据变动趋势,可以在循环神经网络的训练或微调的不同阶段采用不同的方式确定目标迭代间隔,以在保证量化精度的基础上,提高量化效率及运算效率。
进一步地,图3-17示出本公开再一实施例的量化参数调整方法的流程图,如图3-17所示,上述方法还可以包括:
在当前迭代大于第一预设迭代时,处理器还可以执行操作S712,即处理器可以进一步确定当前迭代是否大于第二预设迭代。其中,所述第二预设迭代大于所述第一预设迭代,所述第二预设迭代间隔大于所述第一预设迭代间隔。可选地,上述第二预设迭代可以是超参数,第二预设迭代可以大于至少一个训练周期的迭代总数。可选地,第二预设迭代可以根据待量化数据的数据变动曲线确定。可选地,第二预设迭代也可以是用户自定义设置的。
在所述当前迭代大于或等于第二预设迭代时,则处理器可以执行操作S714,将第二预设迭代间隔作为所述目标迭代间隔,并根据所述第二预设迭代间隔调整所述神经网络量化过程中的参数。在当前迭代大于第一预设迭代,且当前迭代小于第二预设迭代时,则处理器可以执行上述的操作S713,根据所述待量化数据的数据变动幅度确定目标迭代间隔,并根据所述目标迭代间隔调整量化参数。
可选地,处理器可以读取用户设置的第二预设迭代,并根据第二预设迭代与第二预设迭代间隔的对应关系,确定第二预设迭代间隔,该第二预设迭代间隔大于第一预设迭代间隔。可选地,当所述神经网络的收敛程度满足预设条件时,则确定所述当前迭代大于或等于第二预设迭代。例如,在当前迭代的正向运算结果趋近于预设参考值时,可以确定该神经网络的收敛程度满足预设条件,此时可以确定当前迭代大于或等于第二预设迭代。或者,在当前迭代对应的损失值小于或等于预设阈值时,则可以确定该神经网络的收敛程度满足预设条件。
可选地,上述的第二预设迭代间隔可以是超参数,该第二预设迭代间隔可以大于或等于至少一个训练周期的迭代总数。可选地,该第二预设迭代间隔可以是用户自定义设置的。处理器可以直接读取用户输入的第二预设迭代和第二预设迭代间隔,并根据该第二预设迭代间隔更新神经网络运算中的量化参数。例如,该第二预设迭代间隔可以等于一个训练周期的迭代总数,即每个训练周期(epoch)更新一次量化参数。
再进一步地,上述方法还包括:
当所述当前迭代大于或等于第二预设迭代,处理器还可以在每次检验迭代处确定当前数据位宽是否需要调整。如果当前数据位宽需要调整,则处理器可以从上述的操作S714切换至操作S713,以重新确定数据位宽,使得数据位宽能够满足待量化数据的需求。
具体地,处理器可以根据上述的第二误差确定数据位宽是否需要调整。处理器还可以执行上述操作S715,确定第二误差是否大于预设误差值,当所述当前迭代大于或等于第二预设迭代且所述第二误差大于预设误差值时,则切换执行操作S713,根据所述待量化数据的数据变动幅度确定迭代间隔,以根据所述迭代间隔重新确定所述数据位宽。若当前迭代大于或等于第二预设迭代,且第二误差小于或等于预设误差值,则继续执行操作S714,将第二预设迭代间隔作为所述目标迭代间隔,并根据所述第二预设迭代间隔调整所述神经网络量化过程中的参数。其中,预设误差值可以是根据量化误差对应的预设阈值确定的,当第二误差大于预设误差值时,此时说明数据位宽可能需要进一步调整,处理器可以根据所述待量化数据的数据变动幅度确定迭代间隔,以根据所述迭代间隔重新确定所述数据位宽。
例如,第二预设迭代间隔为一个训练周期的迭代总数。在当前迭代大于或等于第二预设迭代时,处理器可以按照第二预设迭代间隔更新量化参数,即每个训练周期(epoch)更新一次量化参数。此时,每个训练周期的起始迭代作为一个检验迭代,在每个训练周期的起始迭代处,处理器可以根据该检验迭代的待量化数据确定量化误差,根据量化误差确定第二误差,并根据如下公式确定第二误差是 否大于预设误差值:
diff update2=θ*diff bit 2>T
其中,diff update2表示第二误差,diff bit表示量化误差,θ表示超参数,T表示预设误差值。可选地,该预设误差值可以等于第一预设阈值除以超参数。当然,该预设误差值也可以是超参数。例如,该预设误差值可以按照如下公式计算获得:T=th/10,其中,th表示第一预设阈值,超参数的取值为10。
若第二误差diff update2大于预设误差值T,则说明数据位宽可能不能满足预设要求,此时,可以不再采用第二预设迭代间隔更新量化参数,处理器可以按照待量化数据的数据变动幅度确定目标迭代间隔,以保证数据位宽满足预设要求。即在第二误差diff update2大于预设误差值T时,处理器从上述的操作S714切换至上述的操作S713。
当然,在其他实施例中,处理器可以根据上述的量化误差,确定数据位宽是否需要调整。例如,第二预设迭代间隔为一个训练周期的迭代总数。在当前检验迭代大于或等于第二预设迭代时,处理器可以按照第二预设迭代间隔更新量化参数,即每个训练周期(epoch)更新一次量化参数。其中,每个训练周期的起始迭代作为一个检验迭代。在每个训练周期的起始迭代处,处理器可以根据该检验迭代的待量化数据确定量化误差,并在该量化误差大于或等于第一预设阈值时,则说明数据位宽可能不能满足预设要求,即处理器从上述的操作S714切换至上述的操作S713。
在一个可选的实施例中,上述的点位置、缩放系数和偏移量等量化参数可以通过显示装置进行显示。此时,用户可以通过显示装置获知循环神经网络运算过程中的量化参数,用户还可以自适应修改处理器确定的量化参数。同理,上述的数据位宽和目标迭代间隔等也可以通过显示装置进行显示。此时,用户可以通过显示装置获知循环神经网络运算过程中的目标迭代间隔和数据位宽等参数,用户还可以自适应修改处理器确定的目标迭代间隔和数据位宽等参数。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
本公开一实施例还提供了一种循环神经网络的量化参数调整装置200,该量化参数调整装置200可以设置于一处理器中。例如,该量化参数调整装置200可以置于通用处理器中,再如,该量化参数调整装置也可以置于一人工智能处理器中。图3-18示出本公开一实施例的量化参数调整装置的结构框图,如图3-18所示,该装置包括获取模块210和迭代间隔确定模块220。
获取模块210,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块220,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
在一种可能的实现方式中,所述装置还包括:
预设间隔确定模块,用于在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
在一种可能的实现方式中,所述迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代 对应的第二目标迭代间隔;
更新迭代确定子模块,根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
在一种可能的实现方式中,所述第二目标迭代间隔确定子模块,包括:
更新周期确定子模块,根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
确定子模块,根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
在一种可能的实现方式中,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
量化参数确定模块,用于根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
在一种可能的实现方式中,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
数据位宽确定模块,用于根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
量化参数确定模块,用于根据获取的点位置迭代间隔和参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
在一种可能的实现方式中,所述点位置迭代间隔小于或等于所述参考迭代间隔。
在一种可能的实现方式中,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
在一种可能的实现方式中,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
在一种可能的实现方式中,所述数据位宽确定模块包括:
量化误差确定子模块,用于根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
数据位宽确定子模块,用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
在一种可能的实现方式中,所述数据位宽确定单元用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
在一种可能的实现方式中,所述数据位宽确定单元用于若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
在一种可能的实现方式中,所述数据位宽确定单元用于若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
在一种可能的实现方式中,所述获取模块包括:
第一获取模块,用于获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
在一种可能的实现方式中,所述第一获取模块包括:
第一均值确定单元,用于根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述目标迭代间隔之前的上一迭代间隔对应的检验迭代;
第二均值确定单元,用于根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
第一误差确定单元,用于根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
在一种可能的实现方式中,所述第二均值确定单元具体用于:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
在一种可能的实现方式中,所述第二均值确定单元具体用于根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
在一种可能的实现方式中,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;
其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
在一种可能的实现方式中,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位 宽调整值,更新所述第二均值时,具体用于:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
在一种可能的实现方式中,所述迭代间隔确定模块用于根据所述第一误差确定所述目标迭代间隔,所述目标迭代间隔与所述第一误差负相关。
在一种可能的实现方式中,所述获取模块还包括:
第二获取模块,用于获取数据位宽的变化趋势;根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
在一种可能的实现方式中,所述迭代间隔确定模块还用于根据获取的第一误差和第二误差,确定所述目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
在一种可能的实现方式中,所述迭代间隔确定模块用于根据获取的第一误差和第二误差,确定所述目标迭代间隔时,具体用于:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述目标迭代间隔,其中,所述目标误差与所述目标迭代间隔负相关。
在一种可能的实现方式中,所述第二误差根据量化误差确定;
其中,所述量化误差根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
应当清楚的是,本申请实施例各个模块或单元的工作原理与上述方法中各个操作的实现过程基本一致,具体可参见上文的描述,此处不再赘述。应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在一个实施例中,本公开还提供了一种计算机可读存储介质,该存储介质中存储有计算机程序,该计算机程序被处理器或装置执行时,实现如上述任一实施例中的方法。具体地,该计算机程序被处 理器或装置执行时,实现如下方法:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定目标迭代间隔,以根据所述目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
应当清楚的是,本申请实施例各个操作的实现与上述方法中各个操作的实现过程基本一致,具体可参见上文的描述,此处不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
依照以下条款可以更好地理解本公开的内容:
条款B1.一种循环神经网络的量化参数调整方法,所述方法包括:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
条款B2.根据条款B1所述的方法,所述方法还包括:
在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
条款B3.根据条款B1所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款B4.根据条款B1至条款B3任一项所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,包括:
在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
条款B5.根据条款B4所述的方法,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔,包括:
根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
条款B6.根据条款B4所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,还包括:
当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
条款B7.根据条款B4所述的方法,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
条款B8.根据条款B4所述的方法,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
根据获取的点位置迭代间隔和所述参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
条款B9.根据条款B8所述的方法,所述点位置迭代间隔小于或等于所述参考迭代间隔。
条款B10.根据条款B7至条款B9任一项所述的方法,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
条款B11.根据条款B7至条款B9任一项所述的方法,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
条款B12.根据条款B7至条款B9任一项所述的方法,所述方法还包括:
根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
条款B13.根据条款B12所述的方法,所述根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽,包括:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
条款B14.根据条款B13所述的方法,若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽,包括:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款B15.根据条款B13所述的方法,所述若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,包括:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款B16.根据条款B1至条款B15任一项所述的方法,所述获取待量化数据的数据变动幅度,包括:
获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
条款B17.根据条款B16所述的方法,所述获取点位置的变动幅度,包括:
根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述参考迭代间隔之前的上一迭代间隔对应的检验迭代;
根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
条款B18.根据条款B17所述的方法,根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
条款B19.根据条款B17所述的方法,所述根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
条款B20.根据条款B17所述的方法,所述方法还包括:
根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
条款B21.根据条款B20所述的方法,所述根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值,包括:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
条款B22.根据条款B17所述的方法,所述根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
根据所述第一误差确定所述第一目标迭代间隔,所述第一目标迭代间隔与所述第一误差负相关。
条款B23.根据条款B16至条款B22任一项所述的方法,所述获取待量化数据的数据变动幅度,还包括:
获取数据位宽的变化趋势;
根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
条款B24.根据条款B23所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,还包括:
根据获取的第一误差和第二误差,确定所述第一目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
条款B25.根据条款B23所述的方法,根据获取的第一误差和第二误差,确定所述第一目标迭代间隔,包括:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述第一目标迭代间隔,其中,所述目标误差与所述第一目标迭代间隔负相关。
条款B26.根据条款B24或条款B25所述的方法,所述第二误差根据量化误差确定;
其中,所述量化误差根据当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
条款B27.根据条款B4所述的方法,所述方法还包括:
当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款B28.根据条款B1-条款B27任一项所述的方法,所述待量化数据为神经元数据、权值数据或梯度数据中的至少一种。
条款B29.一种循环神经网络的量化参数调整装置,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现如条款B1-条款B28任一项所述的方法的步骤。
条款B30.一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时,实现如条款B1-条款B28任一项所述的方法的步骤。
条款B31.一种循环神经网络的量化参数调整装置,所述装置包括:
获取模块,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
条款B32.根据条款B31所述的装置,所述装置还包括:
预设间隔确定模块,用于在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
条款B33.根据条款B31所述的装置,
所述迭代间隔确定模块,还用于在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款B34.根据条款B31至条款B33任一项所述的装置,所述迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
更新迭代确定子模块,根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
条款B35.根据条款B34所述的装置,所述第二目标迭代间隔确定子模块,包括:
更新周期确定子模块,根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
确定子模块,根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
条款B36.根据条款B34所述的装置,
所述迭代间隔确定模块,还用于当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
条款B37.根据条款B34所述的装置,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
量化参数确定模块,用于根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
条款B38.根据条款B34所述的装置,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
数据位宽确定模块,用于根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
量化参数确定模块,用于根据获取的点位置迭代间隔和参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
条款B39.根据条款B38所述的装置,所述点位置迭代间隔小于或等于所述参考迭代间隔。
条款B40.根据条款B37至条款B39任一项所述的装置,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
条款B41.根据条款B37至条款B39任一项所述的装置,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
条款B42.根据条款B37至条款B39任一项所述的装置,所述数据位宽确定模块包括:
量化误差确定子模块,用于根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
数据位宽确定子模块,用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
条款B43.根据条款B42所述的装置,所述数据位宽确定单元用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述 当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
条款B44.根据条款B43所述的装置,所述数据位宽确定单元用于若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款B45.根据条款B43所述的装置,所述数据位宽确定单元用于若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款B46.根据条款B31至条款B45任一项所述的装置,所述获取模块包括:
第一获取模块,用于获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
条款B47.根据条款B46所述的装置,所述第一获取模块包括:
第一均值确定单元,用于根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述目标迭代间隔之前的上一迭代间隔对应的检验迭代;
第二均值确定单元,用于根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
第一误差确定单元,用于根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
条款B48.根据条款B47所述的装置,所述第二均值确定单元具体用于:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
条款B49.根据条款B47所述的装置,所述第二均值确定单元具体用于根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
条款B50.根据条款B47所述的装置,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;
其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
条款B51.根据条款B50所述的装置,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值时,具体用于:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
条款B52.根据条款B47所述的装置,所述迭代间隔确定模块用于根据所述第一误差确定所述目标迭代间隔,所述目标迭代间隔与所述第一误差负相关。
条款B53.根据条款B46至条款B52任一项所述的装置,所述获取模块还包括:
第二获取模块,用于获取数据位宽的变化趋势;根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
条款B54.根据条款B53所述的装置,所述迭代间隔确定模块还用于根据获取的第一误差和第二误差,确定所述目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
条款B55.根据条款B53所述的装置,所述迭代间隔确定模块用于根据获取的第一误差和第二误差,确定所述目标迭代间隔时,具体用于:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述目标迭代间隔,其中,所述目标误差与所述目标迭代间隔负相关。
条款B56.根据条款B54或条款55所述的装置,所述第二误差根据量化误差确定;
其中,所述量化误差根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
条款B57.根据条款B34所述的装置,
所述迭代间隔确定模块,还用于当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
随着神经网络运算复杂度的提高,数据的数据量和数据维度也在不断增大,而传统的神经网络算法通常采用浮点数据格式来执行神经网络运算,这就使得不断增大的数据量等对运算装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。为解决上述问题,相关技术中,对神经网络运算过程涉及的全部数据均由浮点数转化定点数,但由于不同的数据之间具有差异性,或者,同一数据在不同阶段具有差异性,仅“由浮点数转化定点数”时,往往会导致精度不够,从而会影响运算结果。
神经网络中待运算数据通常为浮点数据格式或精度较高的定点数据格式,在承载神经网络的装置中运行神经网络时,浮点数据格式或精度较高的定点数据格式的各种待运算数据,导致神经网络运行的运算量和访存开销都较大。为提高运算效率,本公开实施例所提供的神经网络量化方法、装置、计算机设备和存储介质,可以根据不同类待运算数据进行神经网络中的待运算数据的局部量化,量化后的数据格式通常为位宽较短、精度较低的定点数据格式。利用精度较低的量化后数据执行神经网络的运算,可以降低运算量和访存量。量化后的数据格式可以为位宽较短的定点数据格式。可以将浮点数据格式的待运算数据量化为定点数据格式的待运算数据,也可以将精度较高的定点格式的待运算数据量化为精度较低的定点格式的待运算数据。利用对应的量化参数对数据进行局部量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效 率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求,使神经网络模型可以应用于算力、体积、功耗相对受限的手机等终端。
可以理解的是,量化精度即量化后数据与量化前数据之间的误差的大小。量化精度可以影响神经网络运算结果的准确度。化精度越高,运算结果的准确率越高,但运算量更大、访存开销也更大。相较于位宽较短的量化后数据,位宽较长的量化后数据的量化精度更高,用于执行神经网络的运算时准确率也更高。但在用于进行神经网络的运算时,位宽较长的量化后数据运算量更大、访存开销也较大,运算效率较低。同理,对于相同的待量化数据,采用不同的量化参数得到的量化后数据有不同的量化精度,将产生不同的量化结果,对运算效率和运算结果的准确率也会带来不同的影响。对神经网络进行量化,在运算效率和运算结果的准确率之间进行平衡,可以采用更加符合待运算数据的数据特征的量化后数据位宽和量化参数。
神经网络中的待运算数据可以包括权值、神经元、偏置、梯度中的至少一种。待运算数据为包含多个元素的矩阵。在传统的神经网络量化中,通常将待运算数据的整体进行量化后进行运算。而在利用量化后的待运算数据进行运算时,通常利用整体量化后的待运算数据中的一部分数据进行运算。例如,在卷积层,利用整体量化后的输入神经元进行卷积运算时,根据卷积核的维度和步长,在整体量化后的输入神经元中分别提取与卷积核的维度相当的量化后的神经元进行卷积运算。在全连接层,利用整体量化后的输入神经元进行矩阵乘运算时,在整体量化后的输入神经元中分别按行提取量化后的神经元进行矩阵乘的运算。因此,在传统的神经网络量化方法中,将待运算数据的整体进行量化后再按照部分量化后的数据进行运算,整体的运算效率较低。且将待运算数据的整体量化后再进行运算,需要将整体量化后的待运算数据进行存储,占用的存储空间较大。
根据本公开实施例的神经网络量化方法可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。
图4-1示出根据本公开实施例的神经网络量化方法的流程图。如图4-1所示,该方法可以应用于神经网络中的任一层,该方法包括步骤S3-11至步骤S3-13。该方法可以应用于图1所示的处理器100。其中,处理单元101用于执行步骤S3-11至步骤S3-13。存储单元102用于存储待量化数据、量化参数、数据位宽等与步骤S3-11至步骤S3-13的处理过程相关的数据。
在步骤S3-11中,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种。
神经网络中的待量化层可以为神经网络中的任意一层。可以根据需求将神经网络中的部分层或全部层确定为待量化层。当神经网络中包括多个待量化层时,各待量化层可以连续也可以不连续。根据神经网络的不同,待量化层的种类也可以不同,例如待量化层可以为卷积层、全连接层等,本公开对 待量化层的数量及类型不作限定。
在一种可能的实现方式中,所述待运算数据包括神经元、权值、偏置、梯度中的至少一种。可以根据需求将待量化层中的神经元、权值、偏置、梯度中的至少一种进行量化。目标数据为任意一种待量化的待运算数据。例如,待运算数据为神经元、权值和偏置,需要将神经元和权值进行量化,则神经元为目标数据1,权值为目标数据2。
当待量化层中有多种目标数据时,针对每种目标数据可以采用本公开中的量化方法进行量化后,得到与各目标数据对应的量化数据,再利用各种目标数据的量化数据和不需要进行量化的待运算数据执行待量化层的运算。
神经网络运算的推理阶段可包括:将训练好的神经网络进行前向运算以完成设定任务的阶段。在神经网络的推理阶段,可以将神经元、权值、偏置和梯度中的至少一种作为待量化数据,根据本披露实施例中的方法进行量化后,利用量化后的数据完成待量化层的运算。
神经网络运算的微调阶段可包括:将训练好的神经网络进行预设数量迭代的前向运算和反向运算,进行参数的微调以适应设定任务的阶段。在神经网络运算的微调阶段,可以将神经元、权值、偏置、梯度中的至少一种,根据本公开实施例中的方法进行量化后,利用量化后的数据完成待量化层的前向运算或反向运算。
神经网络运算的训练阶段可包括:将初始化的神经网络进行迭代训练以得到训练好的神经网络的阶段,训练好的神经网络可执行特定任务。在神经网络的训练阶段,可以将神经元、权值、偏置、梯度中的至少一种,根据本公开实施例中的方法进行量化后,利用量化后的数据完成待量化层的前向运算或反向运算。
可以将一个目标数据中的子集作为待量化数据,可以按照不同的方式将目标数据划分为多个子集,将每个子集作为一个待量化数据。将一个目标数据划分为多个待量化数据。可以根据目标数据所要进行的运算类型,将目标数据划分为多个待量化数据。例如,目标数据需要进行卷积运算时,可以根据卷积核的高度和宽度,将目标数据划分为与卷积核对应的多个待量化数据。目标数据为需要进行矩阵乘运算的左矩阵时,可以将目标数据按行划分为多个待量化数据。可以一次将目标数据划分为多个待量化数据,也可以按照运算的顺序,将目标数据依次划分为多个待量化数据。
也可以根据预先设定的数据划分方式,将目标数据划分为多个待量化数据。例如,预先设定的数据划分方式可以为:按照固定的数据大小进行划分,或按照固定的数据形状进行划分。
将目标数据划分为多个待量化数据后,可以将各待量化数据分别进行量化,并根据各待量化数据量化后的数据进行运算。一个待量化数据所需的量化时间,短于目标数据的整体的量化时间,将其中一个待量化数据量化完毕后,即可以用量化后的数据执行后续的运算,而不用等目标数据中的所有待量化数据均量化完成后再执行运算。因此本公开中的目标数据的量化方法,可以提高目标数据的运算效率。
在步骤S3-12中,将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据。
待量化数据对应的量化参数可以为一个量化参数,也可以为多个量化参数。量化参数可以包括点位置等用于对待量化数据进行量化的参数。点位置可以用于确定量化后数据中小数点的位置。量化参数还可以包括缩放系数、偏移量等。
确定与待量化数据对应的量化参数的方式,可以包括:确定与目标数据对应的量化参数后,并将与目标数据对应的量化参数确定为待量化数据的量化参数的方式。当待量化层包括多个目标数据时, 各目标数据均可以有与之对应的量化参数,且各目标数据对应的量化参数可以不同,也可以相同,本公开对此不作限定。将目标数据划分为多个待量化数据后,可以将目标数据对应的量化参数确定为各待量化数据对应的量化参数,此时各待量化数据对应的量化参数相同。
确定与待量化数据对应的量化参数的方式,也可以包括:直接确定各待量化数据对应的量化参数的方式。目标数据可以没有与之对应的量化参数,或目标数据可以有与之对应的量化参数但待量化数据不采用。可以直接为各待量化数据设定对应的量化参数。也可以根据待量化数据计算得到对应的量化参数。此时各待量化数据对应的量化参数可以相同也可以不同。例如,当待量化层为卷积层,目标数据为权重时,可以将权重按照通道划分为多个待量化权重数据,不同通道的待量化权重数据可以对应不同的量化参数。当各待量化数据对应的量化参数不同时,各待量化数据利用对应的量化参数进行量化后,所得到的量化结果需不影响目标数据的运算。
确定与目标数据对应的量化参数的方式、或确定与待量化数据对应的量化参数的方式,可以包括:查找预设的量化参数直接确定量化参数的方式、查找对应关系以确定量化参数的方式,或根据待量化数据计算得到量化参数的方式。以下以确定与待量化数据对应的量化参数的方式为例进行说明:
可以直接设定与待量化数据对应的量化参数。可以将设定好的量化参数存储于设定的存储空间。设定的存储空间可以为片上或片外的存储空间。例如,可以将设定好的量化参数存储于设定的存储空间。各待量化数据在进行量化时,可以在设定的存储空间提取对应的量化参数后进行量化。可以根据经验值设定与每种待量化数据对应的量化参数。也可以根据需求更新所存储的与每种待量化数据对应的量化参数。
可以根据各待量化数据的数据特征,通过查找数据特征与量化参数的对应关系,确定量化参数。例如,待量化数据的数据分布为稀疏和稠密时可以分别对应不同的量化参数。可以通过查找对应关系确定与待量化数据的数据分布对应的量化参数。
还可以根据各待量化数据,利用设定的量化参数计算方法,计算得到各待量化层对应的量化参数。例如,可以根据待量化数据的绝对值最大值和预设的数据位宽,利用取整算法计算得到量化参数中的点位置。
在步骤S3-13中,根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
可以利用设定的量化算法,根据量化参数对待量化数据进行量化,得到量化数据。例如,可以利用取整算法作为量化算法,可以根据数据位宽和点位置对待量化数据进行取整量化得到量化数据。其中,取整算法可以包括向上取整、向下取整、向零取整和四舍五入取整等。本公开对量化算法的具体实现方式不作限定。
各待量化数据可以分别采用对应的量化参数进行量化。由于与各待量化数据对应的量化参数更为贴合各待量化数据自身的特征,使得各待量化层的每种量化数据的量化精度更加符合目标数据的运算需求,也就更加符合待量化层的运算需求。在保证待量化层的运算结果准确率的前提下,能够提高待量化层的运算效率,达到待量化层的运算效率和运算结果准确率之间的平衡。进一步的,将目标数据划分为多个待量化数据分别量化,可以在量化完一个待量化数据后,根据量化得到的量化结果执行运算的同时,可以进行第二个待量化数据的量化,从而在整体上提高目标数据的运算效率,也就提高了待量化层的计算效率。
可以将各待量化数据的量化数据进行合并后得到目标数据的量化结果。也可以将各待量化数据的量化数据进行设定的运算后得到目标数据的量化结果。例如可以将各待量化数据的量化数据按照设定 的权重进行加权运算后得到目标数据的量化结果。本公开对此不作限定。
在神经网络的推理、训练和微调过程中,可以对待量化数据进行离线量化或在线量化。其中,离线量化可以为利用量化参数对待量化数据进行离线处理。在线量化可以为利用量化参数对待量化数据进行在线处理。例如,神经网络运行在人工智能芯片上,可以将待量化数据和量化参数发送至人工智能芯片之外的运算装置进行离线量化,或利用人工智能芯片之外的运算装置对预先得到的待量化数据和量化参数进行离线量化。而在人工智能芯片运行神经网络的过程中,人工智能芯片可以对待量化数据利用量化参数进行在线量化。本公开中对各待量化数据的量化过程为在线或离线不作限定。
在本实施例所提供的神经网络量化方法,对于神经网络中的任意待量化层,所述方法包括:在待量化层的目标数据中确定多个待量化数据,各待量化数据均为目标数据的子集,目标数据为待量化层的任意一种待量化的待运算数据,待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;将各待量化数据分别根据对应的量化参数进行量化,得到与各待量化数据对应的量化数据;根据与各待量化数据对应的量化数据得到目标数据的量化数据,以使待量化层根据目标数据的量化数据进行运算。将目标数据划分为多个待量化数据后,各待量化数据的量化过程与运算过程可以并行执行,从而可以提高目标数据的量化效率和运算效率,也可以提高待量化层直至提高整个神经网络的量化效率和运算效率。
在一种可能的实现方式中,所述待量化层为卷积层,所述目标数据为输入神经元。其中,在所述待量化层的目标数据中确定多个待量化数据,可以包括:
在所述卷积层的输入神经元中,根据卷积核的维度和步长确定与卷积核对应的多个待量化数据,所述卷积核的维度包括高度、宽度、通道数。
卷积层输入神经元的维度可以包括批数(batch,B)、通道(channel,C)、高度(height,H)和宽度(width,W)。当输入神经元的批数为多个时,各批数的输入神经元可以看作维度为通道、高度和宽度的三维数据。各批数的输入神经元可以对应多个卷积核,各批数的输入神经元的通道数,和与之对应的各卷积核的通道数一致。
对于任意一个批数的输入神经元,以及对于与该批数的输入神经元对应的多个卷积核中的任意一个卷积核,可以根据该卷积核的高度、宽度和步长,将该批次的输入神经元与该卷积核对应的部分数据(子集),确定为该批次的输入神经元与该卷积核对应的多个待量化数据。
在一种可能的实现方式中,在输入神经元中确定出的各待量化数据的维度与卷积核的维度一致。图4-2示出根据本公开实施例的将输入神经元按照卷积核确定待量化数据的示意图。如图4-2所示,输入神经元的维度为5×5×3(H×W×C),与之对应的一个卷积核(图中未示出)的维度为3×3×3(H×W×C)。在图4-2中,示出了根据卷积核确定出的待量化数据1,图4-2中待量化数据1的颜色比输入神经元的颜色稍浅,待量化数据1的维度为3×3×3(H×W×C)。图4-3示出根据本公开实施例的将输入神经元按照卷积核确定待量化数据的示意图。在图4-3中,示出了根据卷积核确定出的待量化数据2,图4-3中待量化数据2的颜色比输入神经元的颜色稍深,待量化数据2的维度为3×3×3(H×W×C)。与待量化数据1相比,待量化数据2在W维度方向向右移动了与步长一致的1格。待量化数据1和待量化数据2的维度与卷积核的维度一致。
可以理解的是,根据如图4-2和图4-3示出的待量化数据1和待量化数据2的确定方法,输入神经元的其他待量化数据,可以根据卷积核的维度和步长依次得到。在此不再赘述。
可以根据卷积核的维度和步长,将输入神经元划分得到全部的待量化数据后,将各待量化数据并行地执行量化过程。由于待量化数据的数据量小于输入神经元,对一个待量化数据进行量化的计算量, 小于对输入神经元进行整体量化的计算量,因此,本实施例中的量化方法可以提高输入神经元的量化速度,提高量化效率。也可以将输入神经元根据卷积核的维度和步长进行划分,依次得到各待量化数据后,将得到的各待量化数据分别与卷积核进行卷积运算。各待量化数据的量化过程和卷积运算过程可以并行执行,本实施例中的量化方法可以提高输入神经元的量化效率和运算效率。
在一种可能的实现方式中,在输入神经元中确定出的各待量化数据的维度与卷积核的维度也可以不一致。各待量化数据的维度可以小于卷积核的维度,且卷积核的至少一个维度为待量化数据对应维度的倍数。各待量化数据的维度也可以大于卷积核的维度,且待量化数据的至少一个维度为卷积核对应维度的倍数。
各待量化数据的维度可以小于卷积核的维度,例如,卷积核A的维度为8×8×3时,待量化数据A1的维度可以为4×8×3,待量化数据A2的维度可以为4×8×3,且待量化数据A1和待量化数据A2组成的子集为与卷积核A进行卷积运算的数据。则可以将待量化数据A1和待量化数据A2的量化结果进行拼接,并根据拼接结果与卷积核A进行卷积运算。
各待量化数据的维度也可以大于卷积核的维度,例如,卷积核A的维度为8×8×3时,待量化数据A1的维度可以为16×8×3。则可以将待量化数据A1量化结果进行拆分后,根据拆分结果与卷积核A进行卷积运算。
在一种可能的实现方式中,在对目标数据进行量化的过程中可以采用与目标数据对应的量化参数进行量化。而将目标数据划分为多个待量化数据后,可以采用与各待量化数据对应的量化参数进行量化。与各待量化数据对应的量化参数,可以采用预设的方式或根据待量化数据计算的方式,无论采用何种方式确定与各待量化数据对应的量化参数,都可以使得各待量化数据的量化参数,更加符合待量化数据自身的量化需求。例如,当根据目标数据计算得到对应的量化参数时,可以利用目标数据中各元素的最大值和最小值计算得到量化参数。而在根据待量化数据计算得到对应的量化参数时,可以利用待量化数据中各元素的最大值和最小值计算得到量化参数,待量化数据的量化参数比目标数据的量化参数能够更加贴合待量化数据的数据特征,可以使得待量化数据的量化结果更加准确,量化精度更高。
在本实施例中,在所述卷积层的输入神经元中,根据卷积核的维度和步长确定与卷积核对应的多个待量化数据,所述卷积核的维度包括高度、宽度、通道数。根据卷积核的维度和步长确定出的待量化数据后,对各待量化数据进行量化的计算量,小于对目标数据进行量化的计算量,可以提高目标数据的量化效率。将各待量化数据的量化过程和运算过程并行执行可以提高目标数据的量化效率和运算效率。将各待量化数据根据对应的量化参数进行量化,量化参数可以更加贴合待量化数据自身的量化需求,使得待量化数据的量化结果更加准确。
在一种可能的实现方式中,在所述待量化层的目标数据中确定多个待量化数据,包括:
根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,所述目标数据的维度包括批数、通道、高度、宽度。
可以按照目标数据的一个或多个维度,对目标数据进行划分后,得到多个待量化数据。
可以按照目标数据的一个维度对目标数据进行划分,例如,可以将所述待量化层的目标数据中一个或多个批数的数据,确定为一个待量化数据。假定目标数据B1有3个批数的数据,若将目标数据中一个批数的数据确定为一个待量化数据,则该目标数据B可以被划分为3个待量化数据。也可以将所述待量化层的目标数据中一个或多个通道的数据,确定为一个待量化数据。假定目标数据B2对应于4个通道,若将目标数据中2个通道的数据确定为一个待量化数据,则该目标数据B2可以被划分为2个待量 化数据,每个待量化数据对应包括两个通道的数据。也可以根据高度、宽度对目标数据进行划分,例如,假定目标数据为输入神经元的维度为4×8×3,可以以输入神经元宽度的一半为划分依据,将输入神经元划分为2个待量化数据,每个待量化数据的维度为4×4×3。也可以以输入神经元的高度的一半为划分依据,将输入神经元划分为2个待量化数据,每个待量化数据的维度为2×8×3。
还可以按照目标数据的多个维度对目标数据进行划分,例如,可以根据目标数据的高度和宽度,对目标数据进行划分。例如,假定目标数据为输入神经元的维度为4×8×3,可以以输入神经元宽度的一半、高度的一半为划分依据,将输入神经元划分为8个待量化数据,每个待量化数据的维度为2×4×3。
在一种可能的实现方式中,在所述待量化层的目标数据中确定多个待量化数据,可以包括:
根据运行所述神经网络的装置的实时处理能力,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据的尺寸与所述实时处理能力正相关。
运行神经网络的装置的实时处理能力可以包括:装置对目标数据进行量化的速度,对量化后数据进行运算的速度,对目标数据进行量化和运算时装置所能处理的数据量等表征装置处理目标数据的处理能力相关的信息。例如,可以根据对目标数据进行量化的速度和对量化后数据进行运算的速度,确定待量化数据的尺寸,以使得对待量化数据进行量化的时间与对量化后数据进行运算的速度相同,这样可以量化和运算同步进行,可以提高目标数据的运算效率。运行神经网络的装置的实时处理能力越强,待量化数据的尺寸越大。
在一种可能的实现方式中,该方法还可以包括:根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数。
在该实现方式中,可以对待量化数据进行统计,根据统计结果和数据位宽确定待量化数据对应的量化参数。量化参数可以包括点位置、缩放系数和偏移量中的一种或多种。
在一种可能的实现方式中,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,可以包括:
当所述量化参数不包括偏移量时,根据各所述待量化数据中的绝对值最大值Z 1和对应的数据位宽,得到各所述待量化数据的第一类点位置。其中,该绝对值最大值Z 1是待量化数据中数据取绝对值后所得到的最大值。
在该实现方式中,当待量化数据为相对于原点对称的数据时,量化参数可以不包括偏移量,假设Z 1为待量化数据中元素的绝对值的最大值,待量化数据对应的数据位宽为n,A 1为用数据位宽n对待量化数据进行量化后的量化数据可以表示的最大值,A 1
Figure PCTCN2020095679-appb-000045
A 1需要包含Z 1,且Z 1要大于
Figure PCTCN2020095679-appb-000046
因此有公式(3-1)的约束:
Figure PCTCN2020095679-appb-000047
处理器可以根据待量化数据中的绝对值最大值Z 1和数据位宽n,计算得到第一类点位置s 1。例如,可以利用如下公式(3-2)计算得到待量化数据对应的第一类点位置s 1
Figure PCTCN2020095679-appb-000048
其中,ceil为向上取整,Z 1为待量化数据中的绝对值最大值,s 1为第一类点位置,n为数据位宽。
在一种可能的实现方式中,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,可以包括:
当所述量化参数包括偏移量时,根据各所述待量化数据中的最大值、最小值和对应的数据位宽, 得到各所述待量化数据的第二类点位置s 2
在该实现方式中,可以先获取待量化数据中的最大值Z max、最小值Z min,进而根据最大值Z max、最小值Z min利用下述公式(3-3)进行计算,
Figure PCTCN2020095679-appb-000049
进一步地,根据计算得到的Z 2和对应的数据位宽利用下述公式(3-4)计算第二类点位置s 2
Figure PCTCN2020095679-appb-000050
在该实现方式中,由于量化时,常规情况下会将待量化数据中的最大值和最小值保存下来,直接基于保存的待量化数据中的最大值和最小值来获取绝对值最大值,无需消耗更多的资源去对待量化数据求绝对值,节省确定统计结果的时间。
在一种可能的实现方式中,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
当所述量化参数不包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
根据各所述待量化数据中的绝对值最大值和所述量化后数据的最大值,得到各所述待量化数据的第一类缩放系数f’。其中,第一类缩放系数f’可以包括第一缩放系数f 1和第二缩放系数f 2
其中,该第一缩放系数f 1可以按照如下方式(3-5)进行计算:
Figure PCTCN2020095679-appb-000051
其中,第二缩放系数f 2可以按照如下公式(3-6)进行计算:
Figure PCTCN2020095679-appb-000052
在一种可能的实现方式中,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,可以包括:
根据各所述待量化数据中的最大值和最小值,得到各所述待量化数据的偏移量。
在该实现方式中,图4-4示出根据本公开实施例的对称的定点数表示的示意图。如图4-4所示的待量化数据的数域是以“0”为对称中心分布。Z 1为待量化数据的数域中所有浮点数的绝对值最大值,在图4-4中,A 1为n位定点数可以表示的浮点数的最大值,浮点数A 1转换为定点数是(2 n-1-1)。为了避免溢出,A 1需要包含Z 1。在实际运算中,神经网络运算过程中的浮点数据趋向于某个确定区间的正态分布,但是并不一定满足以“0”为对称中心的分布,这时用定点数表示时,容易出现溢出情况。为了改善这一情况,量化参数中引入偏移量。图4-5示出根据本公开实施例的引入偏移量的定点数表示的示意图。如图4-5所示。待量化数据的数域不是以“0”为对称中心分布,Z min是待量化数据的数域中所有浮点数的最小值,Z max是待量化数据的数域中所有浮点数的最大值,A 2为用n位定点数表示的平移后的浮点数的最大值,A 2
Figure PCTCN2020095679-appb-000053
P为Z min~Z max之间的中心点,将待量化数据的数域整体偏移,使得平移后的待量化数据的数域以“0”为对称中心分布,以避免数据的“溢出”。平移后的待量化数据的数域中的绝对值最大值为Z 2。由图4-5可知,偏移量为“0”点到“P”点之间的水平距离,该距离称为偏移量o。
可以根据该最小值Z min和最大值Z max按照如下公式(3-7)计算获得偏移量:
Figure PCTCN2020095679-appb-000054
其中,o表示偏移量,Z min表示待量化数据所有元素中的最小值,Z max表示待量化数据所有元素中的最大值。
在一种可能的实现方式中,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,可以包括:
当所述量化参数包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
根据各所述待量化数据中的最大值、最小值和量化后数据的最大值,得到各所述待量化数据的第二类缩放系数f”。其中,第二类缩放系数f”可以包括第三缩放系数f 3和第四缩放系数f 4
在该实现方式中,当量化参数包括偏移量时,A 2为用数据位宽n对平移后的待量化数据进行量化后的量化数据可以表示的最大值,A 2
Figure PCTCN2020095679-appb-000055
可以根据待量化数据中的Z max、最小值Z min进行计算得到平移后的待量化数据的数域中的绝对值最大值Z 2,进而按照如下公式(3-8)计算第三缩放系数f 3
Figure PCTCN2020095679-appb-000056
进一步地,第四缩放系数f 4可以按照如下公式(3-9)进行计算:
Figure PCTCN2020095679-appb-000057
在对待量化数据进行量化时,所采用的量化参数不同,进行量化所使用的数据不同。
在一种可能的实现方式中,量化参数可以包括第一类点位置s 1。可以利用如下的公式(3-10)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000058
其中,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。
量化参数可以包括第一类点位置s 1时,可以根据公式(3-11)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000059
Figure PCTCN2020095679-appb-000060
在一种可能的实现方式中,量化参数可以包括第一类点位置和第一缩放系数。可以利用如下的公式(3-12)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000061
当量化参数包括第一类点位置和第一缩放系数时,可以根据公式(3-13)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000062
Figure PCTCN2020095679-appb-000063
在一种可能的实现方式中,量化参数可以包括第二缩放系数。可以利用如下的公式(3-14)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000064
当量化参数包括第二缩放系数时,可以根据公式(3-15)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000065
Figure PCTCN2020095679-appb-000066
在一种可能的实现方式中,量化参数可以包括偏移量。可以利用如下的公式(3-16)对待量化数据进行量化,得到量化数据I x
I x=round(F x-o)    公式(3-16)
当量化参数包括偏移量时,可以根据公式(3-17)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000067
Figure PCTCN2020095679-appb-000068
在一种可能的实现方式中,量化参数可以包括第二类点位置和偏移量。可以利用如下的公式(3-18)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000069
当量化参数包括第二类点位置和偏移量时,可以根据公式(3-19)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000070
Figure PCTCN2020095679-appb-000071
在一种可能的实现方式中,量化参数可以包括第二类缩放系数f”和偏移量o。可以利用如下的公式(3-20)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000072
当量化参数包括第二类缩放系数和偏移量时,可以根据公式(3-21)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000073
Figure PCTCN2020095679-appb-000074
在一种可能的实现方式中,量化参数可以包括第二类点位置、第二类缩放系数和偏移量。可以利用如下的公式(3-22)对待量化数据进行量化,得到量化数据I x
Figure PCTCN2020095679-appb-000075
当量化参数包括第二类点位置、第二类缩放系数和偏移量时,可以根据公式(3-23)对目标数据的量化数据进行反量化,得到目标数据的反量化数据
Figure PCTCN2020095679-appb-000076
Figure PCTCN2020095679-appb-000077
可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换上述公式中的四舍五入的取整运算round。可以理解的是,在数据位宽一定的情况下,根据点位置量化得到的量化数据中,小数点后的位数越多,量化数据的量化精度越大。
在一种可能的实现方式中,上述步骤S3-11可以包括:
通过查找待量化数据与量化参数对应关系,确定与所述待量化层中每种待量化数据对应的量化参数。
在一种可能的实现方式中,各待量化层中与每种待量化数据对应的量化参数,可以是保存的预设值。可以为神经网络建立一个待量化数据与量化参数之间的对应关系,该对应关系可以包括各待量化层的每种待量化数据与量化参数对应关系,并将对应关系保存在各层可以共享访问的存储空间。也可 以为神经网络建立多个待量化数据与量化参数之间的对应关系,各待量化层分别对应其中一个对应关系。可以将各层的对应关系保存在本层独享的存储空间,也可以将各层的对应关系保存在各层可以共享访问的存储空间。
在待量化数据与量化参数对应关系中,可以包括多个待量化数据和与之对应的多个量化参数之间的对应关系。例如,待量化数据与量化参数对应关系A中,可以包括待量化层1的神经元和权值两个待量化数据,神经元对应点位置1、缩放系数1和偏移量1三个量化参数,权值对应点位置2和偏移量2两个量化参数。本披露对待量化数据与量化参数对应关系的具体格式不做限定。
在本实施例中,可以通过查找待量化数据与量化参数对应关系,确定与所述待量化层中每种待量化数据对应的量化参数。可以为各待量化层预设对应的量化参数,并通过对应关系进行存储后,供待量化层查找后使用。本实施例中量化参数的获取方式简单方便。
图4-6示出根据本公开实施例的神经网络量化方法的流程图。在一种可能的实现方式中,如图4-6所示,该方法还可以包括步骤S3-14至步骤S3-16。
在步骤S3-14中,根据各所述待量化数据和各所述待量化数据对应的量化数据,确定各所述待量化数据对应的量化误差。
可以根据待量化数据对应的量化数据与待量化数据之间的误差,确定待量化数据的量化误差。可以利用设定的误差计算方法,例如标准差计算方法、均方根误差计算方法等,计算待量化数据的量化误差。
也可以根据量化参数,将待量化数据对应的量化数据进行反量化后得到反量化数据,根据反量化数据与待量化数据之间的误差,按照公式(3-24)确定待量化数据的量化误差diff bit
Figure PCTCN2020095679-appb-000078
其中,F i为待量化对应的浮点值,其中,i为待量化数据中数据的下标。
Figure PCTCN2020095679-appb-000079
为浮点值对应的反量化数据。
还可以根据量化间隔、量化后的数据的个数以及对应的量化前的数据按照公式(3-25)确定量化误差diff bit
Figure PCTCN2020095679-appb-000080
其中,C为量化时对应的量化间隔,m为量化后获得的量化数据的个数,F i为待量化对应的浮点值,其中,i为待量化数据中数据的下标。
也可以根据量化后的数据以及对应的反量化数据按照公式(3-26)确定量化误差diff bit
Figure PCTCN2020095679-appb-000081
其中,F i为待量化对应的浮点值,其中,i为待量化数据集合中数据的下标。
Figure PCTCN2020095679-appb-000082
为浮点值对应的反量化数据。
在步骤S3-15中,根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据对应的数据位宽,得到各所述待量化数据对应的调整位宽。
可以根据经验值确定误差阈值,误差阈值可以用于表示对量化误差的期望值。当量化误差大于或小于误差阈值时,可以调整待量化数对应的数据位宽,得到待量化数据对应的调整位宽。可以将数据位宽调整为更长的位宽或更短的位宽,以提高或降低量化精度。
可以根据能够接受的最大误差确定误差阈值,当量化误差大于误差阈值时,说明量化精度不能达到预期,需要将数据位宽调整为更长的位宽。也可以根据较高的量化精度确定一个较小的误差阈值,当量化误差小于误差阈值时,说明量化精度较高,神经网络的运行效率将受到影响,可以适当的将数据位宽调整为更短的位宽,以适当的降低量化精度,提高神经网络的运行效率。
可以将数据位宽按照固定的位数步长进行调整,也可以根据量化误差与误差阈值之间的差值的不同,按照可变的调整步长调整数据位宽。本公开对此不作限定。
在步骤S3-16中,将各所述待量化数据对应的数据位宽更新为对应的调整位宽,根据各所述待量化数据和对应的调整位宽计算得到对应的调整量化参数,以使各所述待量化数据根据所述对应的调整量化参数进行量化。
确定调整位宽后,可以将待量化数据对应的数据位宽更新为调整位宽。例如,待量化数据更新前的数据位宽为8位,调整位宽为12位,则更新后待量化数据对应的数据位宽为12位。可以根据调整位宽和待量化数据计算得到待量化数据对应的调整量化参数。可以根据待量化数据对应的调整量化参数重新对待量化数据进行量化,以得到量化精度更高或更低的量化数据,使得待量化层在量化精度和处理效率之间达到平衡。
在神经网络的推理、训练和微调过程中,各层之间的待量化数据可以认为具有一定的关联性。例如,各层的待量化数据之间的均值之间的差小于设定的均值阈值,且各层的待量化数据之间的最大值之间的差值也小于设定的差值阈值时,可以将待量化层的调整量化参数作为后续的一个或多个层的调整量化参数,用于对待量化层后续的一个或多个层的待量化数据进行量化。也可以在神经网络的训练和微调过程中,将待量化层在当前迭代得到的调整量化参数,用于在后续的迭代中对待量化层进行量化。
在一种可能的实现方式中,所述方法还包括:
在所述待量化层之后的一层或多层采用所述待量化层的量化参数。
神经网络根据调整量化参数进行量化,可以包括只在待量化层利用调整量化参数对待量化数据重新进行量化,并将重新得到的量化数据用于待量化层的运算。也可以包括在待量化层不使用调整量化参数重新对待量化数据进行量化,而在待量化层后续的一个或多个层使用调整量化参数进行量化,和/或后续的迭代中在待量化层使用调整量化参数进行量化。还可以包括在待量化层使用调整量化参数重新进行量化,并将重新得到的量化数据用于待量化层的运算,并且在待量化层后续的一个或多个层使用调整量化参数进行量化,和/或后续的迭代中在待量化层使用调整量化参数进行量化。本公开对此不作限定。
在本实施例中,根据待量化数据和待量化数据对应的量化数据之间的误差调整数据位宽,并根据调整后的数据位宽计算得到调整量化参数。通过设置不同的误差阈值可以得到不同的调整量化参数, 达到提高量化精度或提高运行效率等不同的量化需求。根据待量化数据和待量化数据的量化数据计算得到的调整量化参数,也能够更加符合待量化数据自身的数据特征,达到更加符合待量化数据自身需求的量化结果,在量化精度和处理效率之间达到更好的平衡。
在一种可能的实现方式中,步骤S3-15可以包括:
当所述量化误差大于第一误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽。
可以根据能够接受的最大的量化误差,确定第一误差阈值。可以将量化误差与第一误差阈值进行比较。当量化误差大于第一误差阈值时,可以认为量化误差已经不可接受。需要提高量化精度,可以通过增加待量化数据对应的数据位宽的方式,提高待量化数据的量化精度。
可以将待量化数据对应的数据位宽按照固定的调整步长增加,得到调整位宽。固定的调整步长可以为N位,N为正整数。每次调整数据位宽可以增加N位。每次增加后的数据位宽=原数据位宽+N位。
可以将待量化数据对应的数据位宽按照可变的调整步长增加,得到调整位宽。例如,当量化误差与误差阈值之间的差值大于第一阈值时,可以按照调整步长M1调整数据位宽,当量化误差与误差阈值之间的差值小于第一阈值时,可以按照调整步长M2调整数据位宽,其中,第一阈值大于第二阈值,M1大于M2。可以根据需求确定各可变的调整步长。本公开对数据位宽的调整步长及调整步长是否可变不作限定。
可以将待量化数据按照调整位宽计算得到调整后的量化参数。利用调整后的量化参数对待量化数据进行重新量化后得到的量化数据,比利用调整前的量化参数量化得到的量化数据的量化精度更高。
在一种可能的实现方式中,该方法还可以包括:
根据各所述待量化数据和对应的调整位宽计算各所述待量化数据调整后的量化误差;
根据所述调整后的量化误差和所述第一误差阈值继续增加所述对应的调整位宽,直至所述调整后的量化误差小于或等于所述第一误差阈值。
根据量化误差增加待量化数据对应的数据位宽时,调整一次位宽后得到调整位宽,根据调整位宽计算得到调整后的量化参数,根据调整后的量化参数量化待量化数据得到调整后的量化数据,再根据调整后的量化数据与待量化数据计算得到待量化数据调整后的量化误差,调整后的量化误差可能依然大于第一误差阈值,即根据调整一次的数据位宽可能不能满足调整目的。当调整后的量化误差依然大于第一误差阈值时,可以继续对调整后的数据位宽进行调整,即多次增加待量化数据对应的数据位宽,直至根据最终得到的调整位宽和待量化数据得到的调整后的量化误差小于第一误差阈值。
多次增加的调整步长可以是固定的调整步长,也可以是可变的调整步长。例如,最终的数据位宽=原数据位宽+B*N位,其中N为每次增加的固定的调整步长,B为数据位宽的增加次数。最终的数据位宽=原数据位宽+M1+M2+…+Mm,其中,M1、M2…Mm为每次增加的可变的调整步长。
在本实施例中,当量化误差大于第一误差阈值时,增加所述待量化数据对应的数据位宽,得到所述待量化数据对应的调整位宽。可以通过设置第一误差阈值和调整步长增加数据位宽,以使调整后的数据位宽能够满足量化的需求。当一次调整不能满足调整需求时,还可以对数据位宽进行多次调整。第一误差阈值和调整步长的设置,使得量化参数可以按照量化需求进行灵活调整,满足不同的量化需求,使得量化精度可根据自身数据特征进行自适应调整。
在一种可能的实现方式中,步骤S3-15可以包括:
当所述量化误差小于第二误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽,所述第二误差阈值小于所述第一误差阈值
可以根据能够接受的量化误差和期望的神经网络的运行效率,确定第二误差阈值。可以将量化误差与第二误差阈值进行比较。当量化误差小于第二误差阈值时,可以认为量化误差超出预期,但运行效率过低已经不可接受。可以降低量化精度以提高神经网络的运行效率,可以通过减少待量化数据对应的数据位宽的方式,降低待量化数据的量化精度。
可以将待量化数据对应的数据位宽按照固定的调整步长减少,得到调整位宽。固定的调整步长可以为N位,N为正整数。每次调整数据位宽可以减少N位。增加后的数据位宽=原数据位宽-N位。
可以将待量化数据对应的数据位宽按照可变的调整步长减少,得到调整位宽。例如,当量化误差与误差阈值之间的差值大于第一阈值时,可以按照调整步长M1调整数据位宽,当量化误差与误差阈值之间的差值小于第一阈值时,可以按照调整步长M2调整数据位宽,其中,第一阈值大于第二阈值,M1大于M2。可以根据需求确定各可变的调整步长。本披露对数据位宽的调整步长及调整步长是否可变不作限定。
可以将待量化数据按照调整位宽计算得到调整后的量化参数,利用调整后的量化参数对待量化数据进行重新量化后得到的量化数据,比利用调整前的量化参数量化得到的量化数据的量化精度更低。
在一种可能的实现方式中,该方法还可以包括:
根据所述调整位宽和所述待量化数据计算所述待量化数据调整后的量化误差;
根据所述调整后的量化误差和所述第二误差阈值继续减少所述调整位宽,直至根据调整位宽和所述待量化数据计算得到的调整后的量化误差大于或等于所述第二误差阈值。
根据量化误差增加待量化数据对应的数据位宽时,调整一次位宽后得到调整位宽,根据调整位宽计算得到调整后的量化参数,根据调整后的量化参数量化待量化数据得到调整后的量化数据,再根据调整后的量化数据与待量化数据计算得到待量化数据调整后的量化误差,调整后的量化误差可能依然小于第二误差阈值,即根据调整一次的数据位宽可能不能满足调整目的。当调整后的量化误差依然小于第二误差阈值时,可以继续对调整后的数据位宽进行调整,即多次减少待量化数据对应的数据位宽,直至根据最终得到的调整位宽和待量化数据得到的调整后的量化误差大于第二误差阈值。
多次减少的调整步长可以是固定的调整步长,也可以是可变的调整步长。例如,最终的数据位宽=原数据位宽-B*N位,其中N为每次增加的固定的调整步长,B为数据位宽的增加次数。最终的数据位宽=原数据位宽-M1-M2-…-Mm,其中,M1、M2…Mm为每次减少的可变的调整步长。
在本实施例中,当量化误差小于第二误差阈值时,减少所述待量化数据对应的数据位宽,得到所述待量化数据对应的调整位宽。可以通过设置第二误差阈值和调整步长减少数据位宽,以使调整后的数据位宽能够满足量化的需求。当一次调整不能满足调整需求时,还可以对数据位宽进行多次调整。第二误差阈值和调整步长的设置,使得量化参数可以按照量化需求进行灵活的自适应调整,满足不同的量化需求,使得量化精度可调,在量化精度和神经网络的运行效率之间达到平衡。
在一种可能的实现方式中,所述方法还包括:
当所述量化误差大于第一误差阈值时,增加所述待量化数据对应的数据位宽,以及当所述量化误差小于第二误差阈值时,减少所述待量化数据对应的数据位宽,得到所述待量化数据对应的调整位宽。
也可以同时设置两个误差阈值,其中,第一误差阈值用于表示量化精度过低,可以增加数据位宽的位数,第二误差阈值用于表示量化精度过高,可以减少数据位宽的位数。第一误差阈值大于第二误差阈值,可以将待量化数据的量化误差同时与两个误差阈值进行比较,当量化误差大于第一误差阈值时,增加数据位宽的位数,当量化误差小于第二误差阈值时,减少数据位宽的位数。当量化误差位于第一误差阈值和第二误差阈值之间时,数据位宽可以保持不变。
在本实施例中,通过将量化误差与第一误差阈值和第二误差阈值同时进行比较,可以根据比较结果增加或减少数据位宽,可以利用第一误差阈值和第二误差阈值更加灵活的调整数据位宽。使得数据位宽的调整结果更加符合量化需求。
在一种可能的实现方式中,在所述神经网络运算的微调阶段和/或训练阶段,该方法还可以包括:
获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述历史迭代为所述当前迭代之前的迭代;
根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述待量化层根据所述目标迭代间隔更新所述待量化数据的量化参数,所述目标迭代间隔包括至少一次迭代。
在神经网络运算的微调阶段和/或训练阶段包括多次迭代。神经网络中的各待量化层,在进行一次正向运算和一次反向运算,并对待量化层的权值进行更新后,完成一次迭代。在多次迭代中,待量化层中的待量化数据和/或待量化数据对应的量化数据的数据变动幅度,可以用于衡量在不同迭代中的待量化数据和/或量化数据是否可采用相同的量化参数进行量化。若当前迭代以及历史迭代中待量化数据的数据变动幅度较小,例如小于设定的幅度变动阈值时,可以在数据变动幅度较小的多个迭代中采用相同的量化参数。
可以通过提取预存的量化参数的方式,确定与待量化数据对应的量化参数。在不同的迭代中对待量化数据进行量化时,需要在各迭代提取与待量化数据对应的量化参数。若多个迭代的待量化数据和/或待量化数据对应的量化数据的数据变动幅度较小,可将在数据变动幅度较小的多个迭代中采用的相同的量化参数进行暂存,各迭代在进行量化时可以利用暂存的量化参数进行量化运算,不用在每次迭代提取量化参数。
也可以根据待量化数据和数据位宽计算得到量化参数。在不同的迭代中对待量化数据进行量化时,需要在各迭代分别计算量化参数。若多个迭代的待量化数据和/或待量化数据对应的量化数据的数据变动幅度较小,可在数据变动幅度较小的多个迭代中采用的相同的量化参数,则各迭代均可以直接使用其中第一个迭代计算得到的量化参数,而不是每次迭代计算量化参数。
可以理解的是,当待量化数据为权值时,各迭代之间的权值在不断更新,若多个迭代的权值的数据变动幅度较小,或多个迭代的权值对应的量化数据的数据变动幅度较小,可以在多个迭代中利用相同的量化参数对权值进行量化。
可以根据待量化数据的数据变动幅度确定目标迭代间隔,目标迭代间隔包括至少一次迭代,可以在目标迭代间隔内的各迭代使用相同的量化参数,即在目标迭代间隔内的各迭代不再更新待量化数据的量化参数。神经网络根据目标迭代间隔更新待量化数据的量化参数,包括在目标迭代间隔内的迭代,不获取预设的量化参数或不计算量化参数,即在目标迭代间隔内的迭代不更新量化参数。而在目标迭代间隔外的迭代,再获取预设的量化参数或计算量化参数,即在目标迭代间隔外的迭代更新量化参数。
可以理解的是,多个迭代之间的待量化数据或待量化数据的量化数据的数据变动幅度越小,确定出的目标迭代间隔包括的迭代次数越多。可以根据计算得到的数据变动幅度,查找预设的数据变动幅度与迭代间隔的对应关系,确定与计算得到的数据变动幅度对应的目标迭代间隔。可以根据需求预设数据变动幅度与迭代间隔的对应关系。也可以根据计算得到的数据变动幅度,利用设定的计算方法计算得到目标迭代间隔。本披露不限定数据变动幅度的计算方式,以及目标迭代间隔的获取方式。
在本实施例中,在神经网络运算的微调阶段和/或训练阶段,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,根据所述待量化数据的数据变动幅度,确定待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。可以根据多个迭代中待量 化数据或待量化数据对应的量化数据的数据变动幅度,确定目标迭代间隔。神经网络可以根据目标迭代间隔确定是否更新量化参数。由于目标迭代间隔所包括的多个迭代的数据变动幅度较小,目标迭代间隔内的迭代不更新量化参数也可以保证量化精度。而目标迭代间隔内的多个迭代不更新量化参数,可以减少量化参数的提取次数或计算次数,从而提高神经网络的运算效率。
在一种可能的实现方式中,该方法还包括:
根据所述待量化数据在所述当前迭代的数据位宽,确定所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,以使所述神经网络根据所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,确定量化参数。
如本公开上述实施例所述,待量化数据的量化参数可以预设,也可以根据待量化数据对应的数据位宽计算得到。而不同待量化层中待量化数据对应的数据位宽,或相同待量化层中待量化数据在不同迭代中对应的数据位宽,可以根据本公开上述实施例中的方式进行自适应调整。
当待量化数据的数据位宽不可自适应调整,为预设的数据位宽时,可以根据待量化数据在当前迭代的预设的数据位宽,确定待量化数据在目标迭代间隔内的迭代对应的数据位宽。在目标迭代间隔内的各迭代可不使用自身的预设值。
当待量化数据的数据位宽可自适应调整时,可以根据待量化数据在当前迭代对应的数据位宽,确定待量化数据在目标迭代间隔内的迭代对应的数据位宽。在数据位宽可自适应调整时,数据位宽可进行一次调整或多次调整。可以将待量化数据在当前迭代进行自适应调整后的数据位宽,作为目标迭代间隔内的各迭代对应的数据位宽,在目标迭代间隔内的各迭代不再对数据位宽进行自适应调整(更新)。待量化数据在当前迭代可以使用自适应调整后的数据位宽,也可以使用自适应调整前的数据位宽,本披露对此不作限定。
在目标迭代间隔以外的其他迭代,由于待量化数据的数据变动幅度不满足设定条件,可以根据本公开上述的方法对数据位宽进行自适应调整,得到更加符合当前迭代的待量化数据的数据位宽,也可使用本公开中的目标迭代间隔的计算方法,计算得到新的目标迭代间隔并使用,从而在保证目标迭代间隔以外的迭代的量化精度的同时,提高神经网络的运行效率。
在目标迭代间隔内的各迭代的数据位宽相同,各迭代可以根据相同的数据位宽各自计算得到对应的量化参数。量化参数可以包括点位置、缩放系数和偏移量中的至少一种。可以在目标迭代间隔内的各迭代,根据相同的数据位宽分别计算得到量化参数。量化参数包括点位置(包括第一类点位置、第二类点位置)、缩放系数(包括第一类缩放系数和第二类缩放系数)和偏移量时,在目标迭代间隔内的各迭代,可利用相同的数据位宽,分别计算各自对应的点位置、缩放系数和偏移量。
在根据当前迭代的数据位宽,确定目标迭代间隔内各迭代的数据位宽的同时,可以根据当前迭代的量化参数,确定目标迭代间隔内各迭代的对应的量化参数。目标迭代间隔内各迭代的量化参数,也不再重新根据相同的数据位宽计算得到,可以进一步提高神经网络的运算效率。可以根据当前迭代的全部量化参数或部分量化参数,确定目标迭代间隔内各迭代的对应的量化参数。当根据当前迭代的部分量化参数,确定目标迭代间隔内各迭代的对应的量化参数时,剩余部分的量化参数,在目标迭代间隔内各迭代仍需计算。
例如,量化参数包括第二类点位置、第二类缩放系数和偏移量。可以根据当前迭代的数据位宽和第二类点位置,确定目标迭代间隔内各迭代的数据位宽和第二类点位置。则目标迭代间隔内各迭代的第二类缩放系数和偏移量需要根据相同的数据位宽计算得到。也可以根据当前迭代的数据位宽、第二类点位置、第二类缩放系数和偏移量,确定目标迭代间隔内各迭代的数据位宽、第二类点位置、第二 类缩放系数和偏移量,则目标迭代间隔内各迭代的各量化参数均不需要计算得到。
在本实施例中,根据待量化数据在当前迭代对应的数据位宽,确定待量化数据在目标迭代间隔内的迭代对应的数据位宽,以使神经网络根据待量化数据在目标迭代间隔内的迭代对应的数据位宽,确定量化参数。在目标迭代间隔内的各迭代的数据位宽,根据当前迭代的数据位宽确定,由于目标迭代间隔内各迭代的待量化数据的数据变化幅度满足设定的条件,利用相同的数据位宽计算得到的量化参数,可以保证目标迭代间隔内的各迭代的量化精度。目标迭代间隔内各迭代使用相同的数据位宽,也可以提高神经网络的运算效率。在对神经网络进行量化后运算结果的准确率和神经网络的运算效率之间,达到平衡。
在一种可能的实现方式中,该方法还可以包括:根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置,所述点位置包括第一类点位置和/或第二类点位置。
其中,根据所述待量化数据在所述当前迭代对应的第一类点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的第一类点位置。根据所述待量化数据在所述当前迭代对应的第二类点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的第二类点位置。
在量化参数中,相对于缩放系数和偏移量,不同的点位置对相同待量化数据的量化结果产生的影响较大。可以根据待量化数据在当前迭代对应的点位置,确定目标迭代间隔内的迭代对应的点位置。当数据位宽不可自适应调整时,可以将待量化数据在当前迭代预设的点位置,作为待量化数据在目标迭代间隔内各迭代对应的点位置,也可以将待量化数据在当前迭代根据预设的数据位宽计算得到的点位置,作为待量化数据在目标迭代间隔内各迭代对应的点位置。当数据位宽可自适应调整时,可以将待量化数据在当前迭代调整后的点位置,作为待量化数据在目标迭代间隔内各迭代对应的点位置。
根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置的同时,也可以根据待量化数据在当前迭代对应的缩放系数,确定所述待量化数据在所述目标迭代间隔内的迭代对应的缩放系数,和/或根据待量化数据在当前迭代对应的偏移量,确定所述待量化数据在所述目标迭代间隔内的迭代对应的偏移量。
根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置的同时,还可以根据待量化数据在当前迭代对应的数据位宽,确定所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,其中,待量化数据在当前迭代对应的数据位宽,可以是当前迭代预设的数据位宽或自适应调整后的数据位宽。
在本实施例中,根据待量化数据在当前迭代对应的点位置,确定待量化数据在目标迭代间隔内的迭代对应的点位置。在目标迭代间隔内的各迭代的点位置,根据当前迭代的点位置确定,由于目标迭代间隔内各迭代的待量化数据的数据变化幅度满足设定的条件,利用相同的点位置,可以保证目标迭代间隔内的各迭代的量化精度。目标迭代间隔内各迭代使用相同的点位置,也可以提高神经网络的运算效率。在对神经网络进行量化后运算结果的准确率和神经网络的运算效率之间,达到平衡。
在一种可能的实现方式中,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,可以包括:
根据待量化数据在当前迭代的点位置,和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的点位置,计算待量化数据对应各迭代间隔的点位置的滑动平均值,所述点位置包括第一类点位置和/或第二类点位置;
根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度;
其中,根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数,可以包括:
根据所述第一数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
其中,根据待量化数据在当前迭代的第一类点位置和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的第一类点位置,计算待量化数据对应各迭代间隔的第一类点位置的滑动平均值;根据所述待量化数据在当前迭代的第一类点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的第一类点位置的第二滑动平均值,得到所述待量化数据变动幅度。或者,根据待量化数据在当前迭代的第二类点位置和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的第二类点位置,计算待量化数据对应各迭代间隔的第二类点位置的滑动平均值;根据所述待量化数据在当前迭代的第二类点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的第二类点位置的第二滑动平均值,得到所述待量化数据变动幅度。
在一种可能的实现方式中,根据历史迭代间隔确定的与所述当前迭代对应的历史迭代,可以为计算目标迭代间隔的历史迭代。当前迭代与对应的目标迭代间隔之间的对应关系可以包括:
可以从当前迭代开始计数目标迭代间隔,并在当前迭代对应的目标迭代间隔结束后的下一个迭代开始重新计算目标迭代间隔。例如,当前迭代为第100代,目标迭代间隔为3,目标迭代间隔内的迭代包括:第100代、第101代和第102代,可以在第103代计算与第103代对应的目标迭代间隔,并以103代为新计算得到当目标迭代间隔内的第一个迭代。此时,当前迭代为103代时,根据历史迭代间隔确定的与所述当前迭代对应的历史迭代为100代。
可以从当前迭代的下一个迭代开始计数目标迭代间隔,并在目标迭代间隔内的最后一个迭代开始重新计算目标迭代间隔。例如,当前迭代为第100代,目标迭代间隔为3,目标迭代间隔内的迭代包括:第101代、第102代和第103代,可以在第103代计算与第103代对应的目标迭代间隔,并以104代为新计算得到当目标迭代间隔内的第一个迭代。此时,当前迭代为103代时,根据历史迭代间隔确定的与所述当前迭代对应的历史迭代为100代。
可以从当前迭代的下一个迭代开始计数目标迭代间隔,并在目标迭代间隔结束后的下一个迭代开始重新计算目标迭代间隔。例如,当前迭代为第100代,目标迭代间隔为3,目标迭代间隔内的迭代包括:第101代、第102代和第103代,可以在第104代计算与第104代对应的目标迭代间隔,并以105代为新计算得到当目标迭代间隔内的第一个迭代。此时,当前迭代为104代时,根据历史迭代间隔确定的与所述当前迭代对应的历史迭代为100代。
可以根据需求确定当前迭代以及目标迭代间隔之间的其他的对应关系,例如可以从当前迭代之后的第N个迭代开始计数目标迭代间隔,N大于1,本披露对此不作限定。
可以理解的是,计算得到的待量化数据对应各迭代间隔的点位置的滑动平均值,包括待量化数据在当前迭代的点位置的第一滑动平均值,和待量化数据在上一迭代间隔对应迭代的点位置的第二滑动平均值。可以利用公式(3-27)计算当前迭代对应点位置的第一滑动平均值m (t)
m (t)←α×s (t)+(1-α)×m (t-1)   公式(3-27)
其中,t为当前迭代,t-1为根据上一迭代间隔确定的历史迭代,m (t-1)为根据上一迭代间隔确定的历史迭代的第二滑动平均值。s (t)为当前迭代的点位置,可以为第一类点位置或第二类点位置。α为第一参数。第一参数可以为超参数。
在本实施例中,根据待量化数据在当前迭代的点位置,和根据历史迭代间隔确定的与所述当前迭 代对应的历史迭代的点位置,计算待量化数据对应各迭代间隔的点位置的滑动平均值;根据待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度。根据第一数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。由于第一数据变动幅度可以用于衡量点位置的变化趋势,使得目标迭代间隔可以跟随待量化数据点位置的变化趋势而变化,也使得计算得到的各目标迭代间隔的大小可以根据待量化数据点位置的变化趋势而变化。由于量化参数根据目标迭代间隔确定,也就使得根据量化参数进行量化得到的量化数据,能够更加符合待量化数据的点位置的变动趋势,在保证量化精度的同时,提高神经网络的运行效率。
在一种可能的实现方式中,所述根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度,可以包括:
计算所述第一滑动平均值和所述第二滑动平均值的差值;
将所述差值的绝对值确定为第一数据变动幅度。
可以利用公式(3-28)计算第一数据变动幅度diff update1
diff update1=|m (t)-m (t-1)|=α|s (t)-m (t-1)|   公式(3-28)
可以根据第一数据变动幅度,确定待量化数据对应的目标迭代间隔,以使神经网络根据目标迭代间隔更新所述待量化数据的量化参数。可以根据公式(3-29)计算得到目标迭代间隔I:
Figure PCTCN2020095679-appb-000083
其中,β为第二参数,γ为第三参数。第二参数和第三参数可以为超参数。
可以理解的是,第一数据变动幅度可以用于衡量点位置的变化趋势,第一数据变动幅度越大,说明量化数据的数值范围变化剧烈,在更新量化参数时需要间隔更短的目标迭代间隔I。
在本实施例中,计算所述第一滑动平均值和所述第二滑动平均值的差值;将差值的绝对值确定为第一数据变动幅度。根据滑动平均值之间的差值可以得到精确的第一数据变动幅度。
在一种可能的实现方式中,该方法还可以包括:根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度;
其中,根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数,可以包括:
根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
可以根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度。也可以根据在当前迭代所述待量化数据和所述待量化数据对应的反量化数据,得到第二数据变动幅度。
同样的,可以根据公式(3-30)计算得到在当前迭代中,所述待量化数据和所述待量化数据对应的反量化数据之间的第二数据变动幅度diff bit。也可以利用其它误差的计算方法,计算待量化数据和反量化数据之间的第二数据变动幅度diff bit。本公开对此不作限定。
Figure PCTCN2020095679-appb-000084
其中,z i为待量化数据,z i (n)为待量化数据对应的反量化数据。可以理解的是,第二数据变动幅度可以用于衡量待量化数据对应的数据位宽的变化趋势,第二数据变动幅度越大,待量化数据越有可能需要更新对应的数据位宽,需要间隔更短的迭代进行更新,则第二数据变动幅度越大,需要目标迭 代间隔更小。
在本实施例中,根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度。根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。第二数据变动幅度可以用于衡量数据位宽的变动需求,则根据第一数据变动幅度和第二数据变动幅度计算得到的目标迭代间隔,可以同时跟踪点位置和数据位宽的变动,目标迭代间隔也可以更加符合待量化数据自身的数据量化需求。
在一种可能的实现方式中,根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度,可以包括:
计算在当前迭代所述待量化数据和所述待量化数据对应的量化数据之间的误差;
将所述误差的平方确定为所述第二数据变动幅度。
可以利用公式(3-31)计算得到第二数据变动幅度diff update2
diff update2=δ*diff bit 2   公式(3-31)
其中,δ为第四参数,第四参数可以为超参数。
可以理解的是,利用不同的数据位宽可以得到不同的量化参数,进而得到不同的量化数据,产生不同的第二数据变动幅度。第二数据变动幅度可以用于衡量数据位宽的变化趋势,第二数据变动幅度越大,说明需要更短的目标迭代间隔来更加频繁的更新数据位宽,即目标迭代间隔需要更小。
在一种可能的实现方式中,根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,可以包括:
根据所述第一数据变动幅度和所述第二数据变动幅度中的最大值,确定所述待量化数据对应的目标迭代间隔。
可以根据公式(3-32)计算得到目标迭代间隔:
Figure PCTCN2020095679-appb-000085
其中,β为第二参数,γ为第三参数。第二参数和第三参数可以为超参数。
可以理解的是,利用第一数据变动幅度和第二数据变动幅度得到的目标迭代间隔,可以同时衡量数据位宽和点位置的变化趋势,当两者中的其一的变化趋势较大时,便可以使得目标迭代间隔产生相应的变化。目标迭代间隔可以同时追踪数据位宽和点位置的变化并做出相应的调整。使得根据目标迭代间隔更新的量化参数能够更加符合目标数据的变动趋势,最终使得根据量化参数得到的量化数据能够更加符合量化需求。
在一种可能的实现方式中,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,可以包括:
在当前迭代位于更新周期以外时,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述更新周期包括至少一个迭代。
在神经网络运算的训练过程和/或微调过程中,在训练开始或微调开始的多个迭代中,待量化数据的变化幅度较大。若在训练开始或微调开始的多个迭代中计算目标迭代间隔,则计算得到的目标迭代间隔可能会失去其使用的意义。可以根据预设更新周期,在更新周期以内的各迭代,不计算目标迭代间隔,也不适用目标迭代间隔使得多个迭代使用相同的数据位宽或点位置。
当迭代进行至更新周期以外时,即当前迭代位于更新周期以外时,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,并根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目 标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。例如,预设更新周期为100代,则自第1代开始至第100代的迭代中,不计算目标迭代间隔。当迭代进行至101代,即当前迭代为101代时,当前迭代位于更新周期以外,此时,可以根据第101代以及第1代至第100代的迭代中待量化数据的数据变动幅度,确定第101代的待量化数据对应的目标迭代间隔,并在第101代或与第101间隔预设代数的迭代中,使用计算得到的目标迭代间隔。
可以自预设的代数开始计数更新周期,例如可以自第一代开始计数更新周期中多个迭代,也可以自第N代开始计数更新周期中的多个迭代,本公开对此不作限定。
在本实施例中,当迭代进行至更新周期以外时计算并使用目标迭代间隔。可以避免在神经网络运算的训练过程或微调过程的初期,由于待量化数据的变动幅度较大引起的目标迭代间隔使用意义不大的问题,可以在使用目标迭代间隔的情况下,进一步提高神经网络的运行效率。
在一种可能的实现方式中,该方法还可以包括:
在当前迭代位于预设周期内时,根据当前迭代、在所述预设周期的下一周期中与所述当前迭代对应的迭代以及当前迭代对应的迭代间隔,确定周期间隔;
根据所述待量化数据在当前迭代对应的数据位宽,确定在所述周期间隔内的迭代中所述待量化数据的数据位宽;或
根据所述待量化数据在当前迭代对应的点位置,确定在所述周期间隔内的迭代中所述待量化数据的点位置。
在神经网络运算的训练过程或微调过程中,可以包括多个周期。各周期可以包括多个迭代。用于神经网络运算的数据被完整的运算一遍为一个周期。在训练过程中,随着迭代的进行,神经网络的权值变化趋于稳定,当训练稳定后,神经元、权值、偏置和梯度等待量化数据均趋于稳定。待量化数据趋于稳定后,待量化数据的数据位宽和量化参数也趋于稳定。同理,在微调过程中,微调稳定后,待量化数据的数据位宽和量化参数也趋于稳定。
因此,可以根据训练稳定或微调稳定的周期确定预设周期。可以将训练稳定或微调稳定所在周期以后的周期,确定为预设周期。例如训练稳定的周期为第M个周期,则可以将第M个周期以后的周期作为预设周期。在预设周期内,可以每间隔一个周期计算一个目标迭代间隔,并根据计算得到的目标迭代间隔调整一次数据位宽或量化参数,以减少数据位宽或量化参数的更新次数,提高神经网络的运行效率。
例如,预设周期为第M个周期以后的周期。在第M+1个周期中,根据第M个周期中的第P个迭代计算得到的目标迭代间隔,截止至第M+1个周期中的第Q个迭代。根据第M+1个周期中的第Q m+1个迭代计算得到与之对应的目标迭代间隔I m+1。在第M+2个周期中,与第M+1个周期中的第Q m+1个迭代对应的迭代为第Q m+2个迭代。在自第M+1个周期中的第Q m+1个迭代开始,直至第M+2个周期中第Q m+2+I m+1个迭代为止,为周期间隔。在周期间隔内的各迭代,均采用第M+1个周期中的第Q m+1个迭代确定的数据位宽或点位置等量化参数。
在本实施例中,可以设置周期间隔,在神经网络运算的训练或微调达到稳定后,根据周期间隔,每周期更新一次数据位宽或点位置等量化参数。周期间隔可以在训练稳定或微调稳定后,减少数据位宽或点位置的更新次数,在保证量化精度的同时,提高神经网络的运行效率。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属 于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
进一步需要说明的是,虽然图4-1、图4-6的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图4-1、图4-6中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本公开实施例还提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令被处理器执行时实现上述神经网络的数据量化处理方法。
图4-7示出根据本公开一实施例的神经网络量化装置的框图。如图4-7所示,该装置应用于如图1所示的处理器100,该装置包括数据确定模块3-61、数据量化模块3-62和数据运算模块3-63。其中,某一个处理单元101中设置有数据确定模块3-61、数据量化模块3-62和数据运算模块3-63。或者,数据确定模块3-61、数据量化模块3-62和数据运算模块3-63分别设置在不同的处理单元101中。存储单元102用于存储待量化数据、量化参数、数据位宽等与数据确定模块3-61、数据量化模块3-62和数据运算模块3-63的运行相关的数据。
数据确定模块3-61,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;
数据量化模块3-62,将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;
数据运算模块3-63,根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
在一种可能的实现方式中,所述待量化层为卷积层,所述目标数据为输入神经元,所述数据确定模块,可以包括:
第一确定子模块,在所述卷积层的输入神经元中,根据卷积核的维度和步长确定与卷积核对应的多个待量化数据,所述卷积核的维度包括高度、宽度、通道数。
在一种可能的实现方式中,所述数据确定模块,包括:
第二确定子模块,根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,所述目标数据的维度包括批数、通道、高度、宽度。
在一种可能的实现方式中,所述第二确定子模块,包括:
基于批数确定子模块,将所述待量化层的目标数据中一个或多个批数的数据,确定为一个待量化数据。
在一种可能的实现方式中,所述第二确定子模块,包括:
基于通道确定子模块,将所述待量化层的目标数据中一个或多个通道的数据,确定为一个待量化数据。
在一种可能的实现方式中,所述数据确定模块,包括:
第三确定子模块,根据运行所述神经网络的装置的实时处理能力,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据的尺寸与所述实时处理能力正相关。
在一种可能的实现方式中,所述装置还包括:
参数确定子模块,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数。
在一种可能的实现方式中,所述参数确定子模块,包括:
第一点位置确定子模块,当所述量化参数不包括偏移量时,根据各所述待量化数据中的绝对值最大值和对应的数据位宽,得到各所述待量化数据的第一类点位置。
在一种可能的实现方式中,所述参数确定子模块,包括:
第一最大值确定子模块,当所述量化参数不包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
第一缩放系数确定子模块,根据各所述待量化数据中的绝对值最大值和所述量化后数据的最大值,得到各所述待量化数据的第一类缩放系数。
在一种可能的实现方式中,所述参数确定子模块,包括:
第二点位置确定子模块,当所述量化参数包括偏移量时,根据各所述待量化数据中的最大值、最小值和对应的数据位宽,得到各所述待量化数据的第二类点位置。
在一种可能的实现方式中,所述参数确定子模块,包括:
第二最大值确定子模块,当所述量化参数包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
第一缩放系数确定子模块,根据各所述待量化数据中的最大值、最小值和量化后数据的最大值,得到各所述待量化数据的第二类缩放系数。
在一种可能的实现方式中,所述参数确定子模块,包括:
偏移量确定子模块,根据各所述待量化数据中的最大值和最小值,得到各所述待量化数据的偏移量。
在一种可能的实现方式中,所述装置还包括:
第一量化误差确定模块,根据各所述待量化数据和各所述待量化数据对应的量化数据,确定各所述待量化数据对应的量化误差;
调整位宽确定模块,根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据对应的数据位宽,得到各所述待量化数据对应的调整位宽;
调整量化参数确定模块,将各所述待量化数据对应的数据位宽更新为对应的调整位宽,根据各所述待量化数据和对应的调整位宽计算得到对应的调整量化参数,以使各所述待量化数据根据所述对应的调整量化参数进行量化。
在一种可能的实现方式中,所述调整位宽确定模块,包括:
第一调整位宽确定子模块,当所述量化误差大于第一误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽。
在一种可能的实现方式中,所述装置还包括:
第一调整后量化误差模块,根据各所述待量化数据和对应的调整位宽计算各所述待量化数据调整后的量化误差;
第一调整位宽循环确定模块,根据所述调整后的量化误差和所述第一误差阈值继续增加所述对应的调整位宽,直至所述调整后的量化误差小于或等于所述第一误差阈值。
在一种可能的实现方式中,所述调整位宽确定模块,包括:
第二调整位宽确定子模块,当所述量化误差小于第二误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽,所述第二误差阈值小于所述第一误差阈值。
在一种可能的实现方式中,所述装置还包括:第二调整后量化误差模块,根据所述调整位宽和所述待量化数据计算所述待量化数据调整后的量化误差;第二调整位宽循环确定模块,根据所述调整后的量化误差和所述第二误差阈值继续减少所述调整位宽,直至根据调整位宽和所述待量化数据计算得到的调整后的量化误差大于或等于所述第二误差阈值。
在一种可能的实现方式中,在所述神经网络运算的微调阶段和/或训练阶段,所述装置还包括:
第一数据变动幅度确定模块,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述历史迭代为所述当前迭代之前的迭代;
目标迭代间隔确定模块,根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述待量化层根据所述目标迭代间隔更新所述待量化数据的量化参数,所述目标迭代间隔包括至少一次迭代。
在一种可能的实现方式中,所述装置还包括:
第一目标迭代间隔应用模块,根据所述待量化数据在所述当前迭代的数据位宽,确定所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,以使所述神经网络根据所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,确定量化参数。
在一种可能的实现方式中,所述装置还包括:
第二目标迭代间隔应用模块,根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置,所述点位置包括第一类点位置和/或第二类点位置。
在一种可能的实现方式中,所述第一数据变动幅度确定模块,包括:
滑动平均值计算子模块,根据待量化数据在当前迭代的点位置,和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的点位置,计算待量化数据对应各迭代间隔的点位置的滑动平均值,所述点位置包括第一类点位置和/或第二类点位置;
第一数据变动幅度确定子模块,根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度;
其中,所述目标迭代间隔确定模块,包括:
第一目标迭代间隔确定子模块,根据所述第一数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
在一种可能的实现方式中,所述第一数据变动幅度确定子模块,包括:
第一幅度确定子模块,计算所述第一滑动平均值和所述第二滑动平均值的差值;将所述差值的绝对值确定为第一数据变动幅度。
在一种可能的实现方式中,所述装置还包括:
第二数据变动幅度确定模块,根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度;
其中,目标迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
在一种可能的实现方式中,所述第二数据变动幅度确定模块,包括:
第二幅度确定子模块,计算在当前迭代所述待量化数据和所述待量化数据对应的量化数据之间的 误差;将所述误差的平方确定为所述第二数据变动幅度。
在一种可能的实现方式中,所述第二目标迭代间隔确定子模块,包括:
间隔确定子模块,根据所述第一数据变动幅度和所述第二数据变动幅度中的最大值,确定所述待量化数据对应的目标迭代间隔。
在一种可能的实现方式中,所述第一数据变动幅度确定模块,包括:
第二数据变动幅度确定子模块,在当前迭代位于更新周期以外时,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述更新周期包括至少一个迭代。
在一种可能的实现方式中,所述装置还包括:
周期间隔确定模块,在当前迭代位于预设周期内时,根据当前迭代、在所述预设周期的下一周期中与所述当前迭代对应的迭代以及当前迭代对应的迭代间隔,确定周期间隔;
第一周期间隔应用模块,根据所述待量化数据在当前迭代对应的数据位宽,确定在所述周期间隔内的迭代中所述待量化数据的数据位宽;或
第二周期间隔应用模块,根据所述待量化数据在当前迭代对应的点位置,确定在所述周期间隔内的迭代中所述待量化数据的点位置。
本公开实施例所提供的神经网络量化装置,利用对应的量化参数对目标数据中的多个待量化数据分别进行量化,在保证精度的同时,减小了存储数据所占用的存储空间,保证了运算结果的准确性和可靠性,且能够提高运算的效率,且量化同样缩减了神经网络模型的大小,降低了对运行该神经网络模型的终端的性能要求,使神经网络模型可以应用于算力、体积、功耗相对受限的手机等终端。
在一种可能的实现方式中,还公开了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述神经网络量化方法。
依据以下条款可更好地理解前述内容:
条款C1.一种神经网络量化方法,对于所述神经网络中的任意待量化层,所述方法包括:
在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;
将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;
根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
条款C2.根据条款C1所述的方法,所述待量化层为卷积层,所述目标数据为输入神经元,所述在所述待量化层的目标数据中确定多个待量化数据,包括:
在所述卷积层的输入神经元中,根据卷积核的维度和步长确定与卷积核对应的多个待量化数据,所述卷积核的维度包括高度、宽度、通道数。
条款C3.根据条款C1所述的方法,所述在所述待量化层的目标数据中确定多个待量化数据,包括:
根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,所述目标数据的维度包括批数、通道、高度、宽度。
条款C4.根据条款C3所述的方法,所述根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,包括:
将所述待量化层的目标数据中一个或多个批数的数据,确定为一个待量化数据。
条款C5.根据条款C3或条款C4所述的方法,所述根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,包括:
将所述待量化层的目标数据中一个或多个通道的数据,确定为一个待量化数据。
条款C6.根据条款C1至条款C5中任一项所述的方法,所述在所述待量化层的目标数据中确定多个待量化数据,包括:
根据运行所述神经网络的装置的实时处理能力,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据的尺寸与所述实时处理能力正相关。
条款C7.根据条款C1至条款C6中任一项所述的方法,所述方法还包括:
根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数。
条款C8.根据条款C7所述的方法,所述根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
当所述量化参数不包括偏移量时,根据各所述待量化数据中的绝对值最大值和对应的数据位宽,得到各所述待量化数据的第一类点位置。
条款C9.根据条款C7所述的方法,所述根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
当所述量化参数不包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
根据各所述待量化数据中的绝对值最大值和所述量化后数据的最大值,得到各所述待量化数据的第一类缩放系数。
条款C10.根据条款C7所述的方法,所述根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
当所述量化参数包括偏移量时,根据各所述待量化数据中的最大值、最小值和对应的数据位宽,得到各所述待量化数据的第二类点位置。
条款C11.根据条款C7所述的方法,所述根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
当所述量化参数包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
根据各所述待量化数据中的最大值、最小值和量化后数据的最大值,得到各所述待量化数据的第二类缩放系数。
条款C12.根据条款C7所述的方法,所述根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数,包括:
根据各所述待量化数据中的最大值和最小值,得到各所述待量化数据的偏移量。
条款C13.根据条款C1至条款C12中任一项所述的方法,所述方法还包括:
根据各所述待量化数据和各所述待量化数据对应的量化数据,确定各所述待量化数据对应的量化误差;
根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据对应的数据位宽,得到各所述待量化数据对应的调整位宽;
将各所述待量化数据对应的数据位宽更新为对应的调整位宽,根据各所述待量化数据和对应的调整位宽计算得到对应的调整量化参数,以使各所述待量化数据根据所述对应的调整量化参数进行量化。
条款C14.根据条款C13所述的方法,所述根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据对应的数据位宽,得到各所述待量化数据对应的调整位宽,包括:
当所述量化误差大于第一误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽。
条款C15.根据条款C13或条款C14所述的方法,所述方法还包括:
根据各所述待量化数据和对应的调整位宽计算各所述待量化数据调整后的量化误差;
根据所述调整后的量化误差和所述第一误差阈值继续增加所述对应的调整位宽,直至所述调整后的量化误差小于或等于所述第一误差阈值。
条款C16.根据条款C13或条款C14所述的方法,所述根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据对应的数据位宽,得到各所述待量化数据对应的调整位宽,包括:
当所述量化误差小于第二误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽,所述第二误差阈值小于所述第一误差阈值。
条款C17.根据条款C16所述的方法,所述方法还包括:
根据所述调整位宽和所述待量化数据计算所述待量化数据调整后的量化误差;
根据所述调整后的量化误差和所述第二误差阈值继续减少所述调整位宽,直至根据调整位宽和所述待量化数据计算得到的调整后的量化误差大于或等于所述第二误差阈值。
条款C18.根据条款C1至条款C17中任一项所述的方法,在所述神经网络运算的微调阶段和/或训练阶段,所述方法还包括:
获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述历史迭代为所述当前迭代之前的迭代;
根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述待量化层根据所述目标迭代间隔更新所述待量化数据的量化参数,所述目标迭代间隔包括至少一次迭代。
条款C19.根据条款C18所述的方法,所述方法还包括:
根据所述待量化数据在所述当前迭代的数据位宽,确定所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,以使所述神经网络根据所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,确定量化参数。
条款C20.根据条款C19所述的方法,所述方法还包括:
根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置,所述点位置包括第一类点位置和/或第二类点位置。
条款C21.根据条款C18所述的方法,所述获取当前迭代以及历史迭代中待量化数据的数据变动幅度,包括:
根据待量化数据在当前迭代的点位置,和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的点位置,计算待量化数据对应各迭代间隔的点位置的滑动平均值,所述点位置包括第一类点位置和/或第二类点位置;
根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度;
所述根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数,包括:
根据所述第一数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
条款C22.根据条款C21所述的方法,所述根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度,包括:
计算所述第一滑动平均值和所述第二滑动平均值的差值;
将所述差值的绝对值确定为第一数据变动幅度。
条款C23.根据条款C22所述的方法,所述方法还包括:
根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度;
所述根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数,包括:
根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
条款C24.根据条款C23所述的方法,所述根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度,包括:
计算在当前迭代所述待量化数据和所述待量化数据对应的量化数据之间的误差;
将所述误差的平方确定为所述第二数据变动幅度。
条款C25.根据条款C23所述的方法,所述根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,包括:
根据所述第一数据变动幅度和所述第二数据变动幅度中的最大值,确定所述待量化数据对应的目标迭代间隔。
条款C26.根据条款C18至条款C25中任一项所述的方法,所述获取当前迭代以及历史迭代中待量化数据的数据变动幅度,包括:
在当前迭代位于更新周期以外时,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述更新周期包括至少一个迭代。
条款C27.根据条款C18至条款C26中任一项所述的方法,所述方法还包括:
在当前迭代位于预设周期内时,根据当前迭代、在所述预设周期的下一周期中与所述当前迭代对应的迭代以及当前迭代对应的迭代间隔,确定周期间隔;
根据所述待量化数据在当前迭代对应的数据位宽,确定在所述周期间隔内的迭代中所述待量化数据的数据位宽;或
根据所述待量化数据在当前迭代对应的点位置,确定在所述周期间隔内的迭代中所述待量化数据的点位置。
条款C28.一种神经网络量化装置,对于所述神经网络中的任意待量化层,所述装置包括:
数据确定模块,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据均为所述目标数据的子集,所述目标数据为所述待量化层的任意一种待量化的待运算数据,所述待运算数据包括输入神经元、权值、偏置、梯度中的至少一种;
数据量化模块,将各所述待量化数据分别根据对应的量化参数进行量化,得到与各所述待量化数据对应的量化数据;
数据运算模块,根据与各所述待量化数据对应的量化数据得到所述目标数据的量化结果,以使所述待量化层根据所述目标数据的量化结果进行运算。
条款C29.根据条款C28所述的装置,所述待量化层为卷积层,所述目标数据为输入神经元,所述数据确定模块,包括:
第一确定子模块,在所述卷积层的输入神经元中,根据卷积核的维度和步长确定与卷积核对应的多个待量化数据,所述卷积核的维度包括高度、宽度、通道数。
条款C30.根据条款C28所述的装置,所述数据确定模块,包括:
第二确定子模块,根据所述目标数据的维度,在所述待量化层的目标数据中确定多个待量化数据,所述目标数据的维度包括批数、通道、高度、宽度。
条款C31.根据条款C30所述的装置,所述第二确定子模块,包括:
基于批数确定子模块,将所述待量化层的目标数据中一个或多个批数的数据,确定为一个待量化数据。
条款C32.根据条款C30所述的装置,所述第二确定子模块,包括:
基于通道确定子模块,将所述待量化层的目标数据中一个或多个通道的数据,确定为一个待量化数据。
条款C33.根据条款C28至条款C32任一项所述的装置,所述数据确定模块,包括:
第三确定子模块,根据运行所述神经网络的装置的实时处理能力,在所述待量化层的目标数据中确定多个待量化数据,各所述待量化数据的尺寸与所述实时处理能力正相关。
条款C34.根据条款C28至条款C33任一项所述的装置,所述装置还包括:
参数确定子模块,根据各所述待量化数据和对应的数据位宽计算得到对应的量化参数。
条款C35.根据条款C34所述的装置,所述参数确定子模块,包括:
第一点位置确定子模块,当所述量化参数不包括偏移量时,根据各所述待量化数据中的绝对值最大值和对应的数据位宽,得到各所述待量化数据的第一类点位置。
条款C36.根据条款C34所述的装置,所述参数确定子模块,包括:
第一最大值确定子模块,当所述量化参数不包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
第一缩放系数确定子模块,根据各所述待量化数据中的绝对值最大值和所述量化后数据的最大值,得到各所述待量化数据的第一类缩放系数。
条款C37.根据条款C34所述的装置,所述参数确定子模块,包括:
第二点位置确定子模块,当所述量化参数包括偏移量时,根据各所述待量化数据中的最大值、最小值和对应的数据位宽,得到各所述待量化数据的第二类点位置。
条款C38.根据条款C34所述的装置,所述参数确定子模块,包括:
第二最大值确定子模块,当所述量化参数包括偏移量时,根据各所述待量化数据和对应的数据位宽得到量化后数据的最大值;
第一缩放系数确定子模块,根据各所述待量化数据中的最大值、最小值和量化后数据的最大值,得到各所述待量化数据的第二类缩放系数。
条款C39.根据条款C34所述的装置,所述参数确定子模块,包括:
偏移量确定子模块,根据各所述待量化数据中的最大值和最小值,得到各所述待量化数据的偏移量。
条款C40.根据条款C28至条款C39任一项所述的装置,所述装置还包括:
第一量化误差确定模块,根据各所述待量化数据和各所述待量化数据对应的量化数据,确定各所述待量化数据对应的量化误差;
调整位宽确定模块,根据各所述待量化数据对应的量化误差和误差阈值,调整各所述待量化数据 对应的数据位宽,得到各所述待量化数据对应的调整位宽;
调整量化参数确定模块,将各所述待量化数据对应的数据位宽更新为对应的调整位宽,根据各所述待量化数据和对应的调整位宽计算得到对应的调整量化参数,以使各所述待量化数据根据所述对应的调整量化参数进行量化。
条款C41.根据条款C40所述的装置,所述调整位宽确定模块,包括:
第一调整位宽确定子模块,当所述量化误差大于第一误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽。
条款C42.根据条款C40或条款C41所述的装置,所述装置还包括:
第一调整后量化误差模块,根据各所述待量化数据和对应的调整位宽计算各所述待量化数据调整后的量化误差;
第一调整位宽循环确定模块,根据所述调整后的量化误差和所述第一误差阈值继续增加所述对应的调整位宽,直至所述调整后的量化误差小于或等于所述第一误差阈值。
条款C43.根据条款C40或条款C41所述的装置,所述调整位宽确定模块,包括:
第二调整位宽确定子模块,当所述量化误差小于第二误差阈值时,增加所述对应的数据位宽,得到所述对应调整位宽,所述第二误差阈值小于所述第一误差阈值。
条款C44.根据条款C43所述的装置,所述装置还包括:
第二调整后量化误差模块,根据所述调整位宽和所述待量化数据计算所述待量化数据调整后的量化误差;
第二调整位宽循环确定模块,根据所述调整后的量化误差和所述第二误差阈值继续减少所述调整位宽,直至根据调整位宽和所述待量化数据计算得到的调整后的量化误差大于或等于所述第二误差阈值。
条款C45.根据条款C28至条款C44任一项所述的装置,在所述神经网络运算的微调阶段和/或训练阶段,所述装置还包括:
第一数据变动幅度确定模块,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述历史迭代为所述当前迭代之前的迭代;
目标迭代间隔确定模块,根据所述待量化数据的数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述待量化层根据所述目标迭代间隔更新所述待量化数据的量化参数,所述目标迭代间隔包括至少一次迭代。
条款C46.根据条款C45所述的装置,所述装置还包括:
第一目标迭代间隔应用模块,根据所述待量化数据在所述当前迭代的数据位宽,确定所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,以使所述神经网络根据所述待量化数据在所述目标迭代间隔内的迭代对应的数据位宽,确定量化参数。
条款C47.根据条款C46所述的装置,所述装置还包括:
第二目标迭代间隔应用模块,根据所述待量化数据在所述当前迭代对应的点位置,确定所述待量化数据在所述目标迭代间隔内的迭代对应的点位置,所述点位置包括第一类点位置和/或第二类点位置。
条款C48.根据条款C45所述的装置,所述第一数据变动幅度确定模块,包括:
滑动平均值计算子模块,根据待量化数据在当前迭代的点位置,和根据历史迭代间隔确定的与所述当前迭代对应的历史迭代的点位置,计算待量化数据对应各迭代间隔的点位置的滑动平均值,所述 点位置包括第一类点位置和/或第二类点位置;
第一数据变动幅度确定子模块,根据所述待量化数据在当前迭代的点位置的第一滑动平均值,以及在上一迭代间隔对应迭代的点位置的第二滑动平均值,得到第一数据变动幅度;
其中,所述目标迭代间隔确定模块,包括:
第一目标迭代间隔确定子模块,根据所述第一数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
条款C49.根据条款C48所述的装置,所述第一数据变动幅度确定子模块,包括:
第一幅度确定子模块,计算所述第一滑动平均值和所述第二滑动平均值的差值;将所述差值的绝对值确定为第一数据变动幅度。
条款C50.根据条款C49所述的装置,所述装置还包括:
第二数据变动幅度确定模块,根据在当前迭代所述待量化数据和所述待量化数据对应的量化数据,得到第二数据变动幅度;
其中,目标迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,根据所述待量化数据的第一数据变动幅度和所述第二数据变动幅度,确定所述待量化数据对应的目标迭代间隔,以使所述神经网络根据所述目标迭代间隔更新所述待量化数据的量化参数。
条款C51.根据条款C50所述的装置,所述第二数据变动幅度确定模块,包括:
第二幅度确定子模块,计算在当前迭代所述待量化数据和所述待量化数据对应的量化数据之间的误差;将所述误差的平方确定为所述第二数据变动幅度。
条款C52.根据条款C50所述的装置,所述第二目标迭代间隔确定子模块,包括:
间隔确定子模块,根据所述第一数据变动幅度和所述第二数据变动幅度中的最大值,确定所述待量化数据对应的目标迭代间隔。
条款C53.根据条款C45至条款C52任一项所述的装置,所述第一数据变动幅度确定模块,包括:
第二数据变动幅度确定子模块,在当前迭代位于更新周期以外时,获取当前迭代以及历史迭代中待量化数据的数据变动幅度,所述更新周期包括至少一个迭代。
条款C54.根据条款C45至条款C53任一项所述的装置,所述装置还包括:
周期间隔确定模块,在当前迭代位于预设周期内时,根据当前迭代、在所述预设周期的下一周期中与所述当前迭代对应的迭代以及当前迭代对应的迭代间隔,确定周期间隔;
第一周期间隔应用模块,根据所述待量化数据在当前迭代对应的数据位宽,确定在所述周期间隔内的迭代中所述待量化数据的数据位宽;或
第二周期间隔应用模块,根据所述待量化数据在当前迭代对应的点位置,确定在所述周期间隔内的迭代中所述待量化数据的点位置。
条款C55.一种人工智能芯片,所述芯片包括如条款C28至条款C54中任意一项所述的神经网络量化装置。
条款C56.一种电子设备,所述电子设备包括如条款C55所述的人工智能芯片。
条款C57.一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款C56所述的人工智能芯片;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
所述存储器件,用于存储数据;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
所述控制器件,用于对所述人工智能芯片的状态进行监控。
条款C58.根据条款C57所述的板卡,
所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;
所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;
所述接口装置为:标准PCIE接口。
条款C59.一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现条款C1至条款C27中任意一项所述的神经网络量化方法。
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述数据处理装置。
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
图5示出根据本公开实施例的板卡的结构框图。参阅图5,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存 储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
上述处理器100中的存储单元102可以包括一组或多组存储单元393。在存储单元102包括一组存储单元393时,多个处理单元101公用存储单元393进行数据存储。在存储单元102包括多组存储单元393时,可以为每一个处理单元101设置其专用的一组存储单元393,并为多个处理单元101中的部分或全部设置其公用的一组存储单元393。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。

Claims (45)

  1. 一种神经网络的数据量化处理方法,其特征在于,应用于处理器,所述方法包括:
    按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;
    根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
    利用对应的量化参数对所述待量化数据进行量化,
    其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
  2. 根据权利要求1所述的方法,其特征在于,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
  3. 根据权利要求1所述的方法,其特征在于,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同,对应层的所有通道中的神经元对应的量化参数和数据位宽相同,对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述神经网络运算过程包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
  5. 根据权利要求1至3任一项所述的方法,其特征在于,所述统计结果包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一,
    其中,所述绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
  6. 根据权利要求1至3任一项所述的方法,其特征在于,所述缩放系数是根据所述点位置参数、所述统计结果、所述数据位宽确定的。
  7. 根据权利要求1至3任一项所述的方法,其特征在于,所述偏移量是根据每种待量化数据的统计结果确定的。
  8. 根据权利要求1至3任一项所述的方法,其特征在于,所述点位置参数是根据所述统计结果和所述数据位宽确定的。
  9. 根据权利要求1至3任一项所述的方法,其特征在于,所述数据位宽为预设值。
  10. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:
    根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,以利用调整后的数据位宽确定量化参数,
    其中,所述量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
  11. 根据权利要求10所述的方法,其特征在于,根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,包括:
    对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,
    其中,所述阈值包括第一阈值和第二阈值中的至少一个。
  12. 根据权利要求11所述的方法,其特征在于,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,包括以下任一项:
    在所述量化误差大于或等于所述第一阈值时,增加所述数据位宽;
    在所述量化误差小于或等于所述第二阈值时,减少所述数据位宽;
    在所述量化误差处于所述第一阈值和所述第二阈值之间时,所述数据位宽保持不变。
  13. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    对量化后的数据进行反量化,获得反量化数据,其中,所述反量化数据的数据格式与对应的量化前的数据的数据格式相同;
    根据所述量化后的数据以及对应的反量化数据确定所述量化误差。
  14. 根据权利要求10所述的方法,其特征在于,所述量化前的数据是所述待量化数据。
  15. 根据权利要求10所述的方法,其特征在于,所述量化前的数据是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据;
    其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述数据位宽。
  16. 根据权利要求15所述的方法,其特征在于,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
    或者所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的,
    其中,所述预判时间点是用于判断是否需要对所述数据位宽进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
  17. 根据权利要求16所述的方法,其特征在于,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数的滑动平均值、上一预判时间点对应的点位置参数的滑动平均值确定的,
    或者,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数、上一预判时间点对应的点位置参数的滑动平均值确定的,
    其中,所述数据位宽的变化趋势值是根据对应所述量化误差确定的。
  18. 根据权利要求17所述的方法,其特征在于,所述当前预判时间点对应的点位置参数的滑动平均值的确定步骤包括:
    根据上一预判时间点对应的点位置参数与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数;
    根据所述数据位宽的调整值对所述上一预判时间点对应的点位置参数的滑动平均值进行调整,获得调整结果;
    根据所述当前预判时间点对应的点位置参数、所述调整结果确定当前预判时间点对应的点位置参数的滑动平均值。
  19. 根据权利要求17所述的方法,其特征在于,所述当前预判时间点对应的点位置参数的滑动平均值的确定步骤包括:
    根据上一预判时间点对应的点位置参数与上一预判时间点对应的点位置参数的滑动平均值确定当前预判时间点对应的点位置参数的滑动平均值的中间结果;
    根据当前预判时间点对应的点位置参数的滑动平均值的中间结果与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数的滑动平均值。
  20. 根据权利要求10所述的方法,其特征在于,所述量化前的数据是在目标迭代间隔内的权值更新迭代时涉及的待量化数据;其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述量化参数,
    其中,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
    所述预判时间点是用于判断是否需要对所述量化参数进行调整的时间点,所述预判时间点对应权 值更新迭代完成时的时间点。
  21. 一种神经网络的数据量化处理装置,其特征在于,应用于处理器,所述装置包括:
    数据统计模块,按照神经网络运算过程中对应层以及对应层中的通道数统计待量化数据,确定每种待量化数据的统计结果;
    量化参数确定模块,根据统计结果以及数据位宽,确定对应层中每种待量化数据的量化参数;
    量化处理模块,利用对应的量化参数对所述待量化数据进行量化,
    其中,所述待量化数据包括所述神经网络的神经元、权值、梯度中的至少一种数据,所述量化参数包括点位置参数、缩放系数和偏移量。
  22. 根据权利要求21所述的装置,其特征在于,在对应层的通道为单通道、或者对应层无通道时,对应层中每种待量化数据对应的量化参数和数据位宽相同。
  23. 根据权利要求21所述的装置,其特征在于,在对应层的通道为多通道时,对应层的同一通道中的权值的缩放系数、偏移量相同,对应层中所有通道中的权值对应的点位置参数和数据位宽相同,对应层的所有通道中的神经元对应的量化参数和数据位宽相同,对应层的所有通道中的梯度对应的量化参数和数据位宽相同。
  24. 根据权利要求21至23任一项所述的装置,其特征在于,所述神经网络运算过程包括神经网络训练、神经网络推理、神经网络微调中的至少一种运算。
  25. 根据权利要求21至23任一项所述的装置,其特征在于,所述统计结果包括以下任一种:每种待量化数据中的绝对值最大值、每种待量化数据中的最大值和最小值的距离的二分之一,
    其中,所述绝对值最大值是每种待量化数据中的最大值或最小值的绝对值。
  26. 根据权利要求21至23任一项所述的装置,其特征在于,所述缩放系数是根据所述点位置参数、所述统计结果、所述数据位宽确定的。
  27. 根据权利要求21至23任一项所述的装置,其特征在于,所述偏移量是根据每种待量化数据的统计结果确定的。
  28. 根据权利要求21至23任一项所述的装置,其特征在于,所述点位置参数是根据所述统计结果和所述数据位宽确定的。
  29. 根据权利要求21至23任一项所述的装置,其特征在于,所述数据位宽为预设值。
  30. 根据权利要求21至23任一项所述的装置,其特征在于,所述装置还包括:
    位宽调整模块,根据所述数据位宽对应的量化误差,对所述数据位宽进行调整,以利用调整后的数据位宽确定量化参数,
    其中,所述量化误差是根据对应层中量化后的数据与对应的量化前的数据确定的。
  31. 根据权利要求30所述的装置,其特征在于,所述位宽调整模块,包括:
    调整子模块,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,
    其中,所述阈值包括第一阈值和第二阈值中的至少一个。
  32. 根据权利要求31所述的装置,其特征在于,对所述量化误差与阈值进行比较,根据比较结果调整所述数据位宽,包括以下任一项:
    在所述量化误差大于或等于所述第一阈值时,增加所述数据位宽;
    在所述量化误差小于或等于所述第二阈值时,减少所述数据位宽;
    在所述量化误差处于所述第一阈值和所述第二阈值之间时,所述数据位宽保持不变。
  33. 根据权利要求30所述的装置,其特征在于,所述装置还包括:
    反量化处理模块,对量化后的数据进行反量化,获得反量化数据,其中,所述反量化数据的数据格式与对应的量化前的数据的数据格式相同;
    量化误差确定模块,根据所述量化后的数据以及对应的反量化数据确定所述量化误差。
  34. 根据权利要求30所述的装置,其特征在于,所述量化前的数据是所述待量化数据。
  35. 根据权利要求30所述的装置,其特征在于,所述量化前的数据是在目标迭代间隔内的权值更新迭代过程中涉及的待量化数据;
    其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述数据位宽。
  36. 根据权利要求35所述的装置,其特征在于,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
    或者所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值和数据位宽的变化趋势值确定的,
    其中,所述预判时间点是用于判断是否需要对所述数据位宽进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
  37. 根据权利要求36所述的装置,其特征在于,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数的滑动平均值、上一预判时间点对应的点位置参数的滑动平均值确定的,
    或者,所述点位置参数的变化趋势值是根据当前预判时间点对应的点位置参数、上一预判时间点对应的点位置参数的滑动平均值确定的,
    其中,所述数据位宽的变化趋势值是根据对应所述量化误差确定的。
  38. 根据权利要求37所述的装置,其特征在于,所述装置还包括:
    第一滑动平均值确定模块,根据上一预判时间点对应的点位置参数与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数;
    根据所述数据位宽的调整值对所述上一预判时间点对应的点位置参数的滑动平均值进行调整,获得调整结果;
    根据所述当前预判时间点对应的点位置参数、所述调整结果确定当前预判时间点对应的点位置参数的滑动平均值。
  39. 根据权利要求37所述的装置,其特征在于,所述装置还包括:
    第二滑动平均值确定模块,根据上一预判时间点对应的点位置参数与上一预判时间点对应的点位置参数的滑动平均值确定当前预判时间点对应的点位置参数的滑动平均值的中间结果;
    根据当前预判时间点对应的点位置参数的滑动平均值的中间结果与所述数据位宽的调整值确定所述当前预判时间点对应的点位置参数的滑动平均值。
  40. 根据权利要求30所述的装置,其特征在于,所述量化前的数据是在目标迭代间隔内的权值更新迭代时涉及的待量化数据;其中,所述目标迭代间隔包括至少一次权值更新迭代,且同一目标迭代间隔内量化过程中采用相同的所述量化参数,
    其中,所述目标迭代间隔是根据在预判时间点权值更新迭代过程中涉及的待量化数据的点位置参数的变化趋势值确定的,
    所述预判时间点是用于判断是否需要对所述量化参数进行调整的时间点,所述预判时间点对应权值更新迭代完成时的时间点。
  41. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求21至40中任意一项所述的神经网 络的数据量化处理装置。
  42. 一种电子设备,其特征在于,所述电子设备包括如权利要求41所述的人工智能芯片。
  43. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求41所述的人工智能芯片;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
    所述存储器件,用于存储数据;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。
  44. 根据权利要求43所述的板卡,其特征在于,
    所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;
    所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;
    所述接口装置为:标准PCIE接口。
  45. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至20中任意一项所述的神经网络的数据量化处理方法。
PCT/CN2020/095679 2019-08-07 2020-06-11 数据处理方法、装置、计算机设备和存储介质 WO2021022903A1 (zh)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
CN201910726329 2019-08-07
CN201910726329.9 2019-08-07
CN201910784982.0 2019-08-23
CN201910784982 2019-08-23
CN201910798228.2 2019-08-27
CN201910798228 2019-08-27
CN201910886905.6A CN112085182A (zh) 2019-06-12 2019-09-19 数据处理方法、装置、计算机设备和存储介质
CN201910888141.4A CN112085150A (zh) 2019-06-12 2019-09-19 量化参数调整方法、装置及相关产品
CN201910886905.6 2019-09-19
CN201910888599.X 2019-09-19
CN201910888141.4 2019-09-19
CN201910888599.XA CN112085177A (zh) 2019-06-12 2019-09-19 数据处理方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021022903A1 true WO2021022903A1 (zh) 2021-02-11

Family

ID=74502654

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095679 WO2021022903A1 (zh) 2019-08-07 2020-06-11 数据处理方法、装置、计算机设备和存储介质

Country Status (1)

Country Link
WO (1) WO2021022903A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554149A (zh) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 神经网络处理单元npu、神经网络的处理方法及其装置
US11436442B2 (en) * 2019-11-21 2022-09-06 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
CN117952182A (zh) * 2024-03-25 2024-04-30 之江实验室 一种基于数据质量的混合精度模型训练方法及装置
US12020145B2 (en) 2017-11-03 2024-06-25 Imagination Technologies Limited End-to-end data format selection for hardware implementation of deep neural networks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks
CN107256422A (zh) * 2017-06-06 2017-10-17 上海兆芯集成电路有限公司 数据量化方法及装置
CN109472353A (zh) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 一种卷积神经网络量化电路及量化方法
CN109840589A (zh) * 2019-01-25 2019-06-04 深兰人工智能芯片研究院(江苏)有限公司 一种在fpga上运行卷积神经网络的方法、装置及系统
CN110020616A (zh) * 2019-03-26 2019-07-16 深兰科技(上海)有限公司 一种目标识别方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks
CN107256422A (zh) * 2017-06-06 2017-10-17 上海兆芯集成电路有限公司 数据量化方法及装置
CN109472353A (zh) * 2018-11-22 2019-03-15 济南浪潮高新科技投资发展有限公司 一种卷积神经网络量化电路及量化方法
CN109840589A (zh) * 2019-01-25 2019-06-04 深兰人工智能芯片研究院(江苏)有限公司 一种在fpga上运行卷积神经网络的方法、装置及系统
CN110020616A (zh) * 2019-03-26 2019-07-16 深兰科技(上海)有限公司 一种目标识别方法及设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020145B2 (en) 2017-11-03 2024-06-25 Imagination Technologies Limited End-to-end data format selection for hardware implementation of deep neural networks
US11436442B2 (en) * 2019-11-21 2022-09-06 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
CN113554149A (zh) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 神经网络处理单元npu、神经网络的处理方法及其装置
CN113554149B (zh) * 2021-06-18 2022-04-12 北京百度网讯科技有限公司 神经网络处理单元npu、神经网络的处理方法及其装置
CN117952182A (zh) * 2024-03-25 2024-04-30 之江实验室 一种基于数据质量的混合精度模型训练方法及装置

Similar Documents

Publication Publication Date Title
US20220261634A1 (en) Neural network quantization parameter determination method and related products
WO2021022903A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036908A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036905A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036890A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036904A1 (zh) 数据处理方法、装置、计算机设备和存储介质
JP2021177369A5 (zh)
JP2021179966A5 (zh)
JPWO2020248424A5 (zh)
WO2021036362A1 (zh) 用于处理数据的方法、装置以及相关产品
CN112085176B (zh) 数据处理方法、装置、计算机设备和存储介质
WO2022111002A1 (zh) 用于训练神经网络的方法、设备和计算机可读存储介质
WO2021036892A1 (zh) 循环神经网络的量化参数调整方法、装置及相关产品
US20220121908A1 (en) Method and apparatus for processing data, and related product
US20220222041A1 (en) Method and apparatus for processing data, and related product
WO2021169914A1 (zh) 数据量化处理方法、装置、电子设备和存储介质
WO2021082653A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036412A1 (zh) 数据处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20850218

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20850218

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20850218

Country of ref document: EP

Kind code of ref document: A1