WO2021036892A1 - 循环神经网络的量化参数调整方法、装置及相关产品 - Google Patents

循环神经网络的量化参数调整方法、装置及相关产品 Download PDF

Info

Publication number
WO2021036892A1
WO2021036892A1 PCT/CN2020/110142 CN2020110142W WO2021036892A1 WO 2021036892 A1 WO2021036892 A1 WO 2021036892A1 CN 2020110142 W CN2020110142 W CN 2020110142W WO 2021036892 A1 WO2021036892 A1 WO 2021036892A1
Authority
WO
WIPO (PCT)
Prior art keywords
iteration
data
current inspection
bit width
target
Prior art date
Application number
PCT/CN2020/110142
Other languages
English (en)
French (fr)
Inventor
刘少礼
周诗怡
张曦珊
曾洪博
Original Assignee
安徽寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910888141.4A external-priority patent/CN112085150A/zh
Application filed by 安徽寒武纪信息科技有限公司 filed Critical 安徽寒武纪信息科技有限公司
Priority to US17/622,647 priority Critical patent/US20220366238A1/en
Publication of WO2021036892A1 publication Critical patent/WO2021036892A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a method, device and related products for adjusting quantitative parameters of a recurrent neural network.
  • the traditional technology uses a fixed bit width to quantify the calculation data of the recurrent neural network, that is, convert the floating-point calculation data into fixed-point calculation data, so as to realize the compression of the calculation data of the recurrent neural network.
  • the traditional quantization method uses the same quantization parameter (such as point position) for the entire recurrent neural network for quantization, which often leads to lower accuracy and affects the result of the data operation.
  • the present disclosure proposes a method, device and related products for adjusting the quantization parameters of the cyclic neural network, which can improve the quantization accuracy of the neural network and ensure the correctness and reliability of the calculation result.
  • the present disclosure provides a method for adjusting quantization parameters of a neural network.
  • the method includes:
  • the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a quantization parameter adjustment device of a recurrent neural network, including a memory and a processor, the memory stores a computer program, and the processor implements the method described in any one of the above when the computer program is executed A step of. Specifically, when the processor executes the above computer program, the following operations are implemented:
  • the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed, it realizes the steps of any one of the above-mentioned methods. Specifically, when the above computer program is executed, the following operations are implemented:
  • the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the present disclosure also provides a quantitative parameter adjustment device of the cyclic neural network, the device including:
  • the acquisition module is used to acquire the data change range of the data to be quantified
  • the iteration interval determination module is configured to determine a first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein the The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the quantization parameter adjustment method, device and related products of the cyclic neural network of the present disclosure obtain the data variation range of the data to be quantized, and determine the first target iteration interval according to the data variation range of the data to be quantized, so that the first target iteration interval can be determined according to the first
  • a target iteration interval adjusts the quantization parameters of the cyclic neural network, so that the quantization parameters of the cyclic neural network in different operation stages can be determined according to the data distribution characteristics of the data to be quantified.
  • the method and device of the present disclosure can improve the accuracy of the quantization process of the recurrent neural network, and thus ensure the accuracy of the calculation results, for various calculation data of the same recurrent neural network. Sex and reliability.
  • the quantization efficiency can be improved by determining the target iteration interval.
  • FIG. 1 shows a schematic diagram of an application environment of a quantization parameter adjustment method according to an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of the correspondence between data to be quantized and quantized data according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of conversion of data to be quantized according to an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of a method for adjusting quantization parameters of a recurrent neural network according to an embodiment of the present disclosure
  • FIG. 5a shows a change trend diagram of data to be quantified in an operation process according to an embodiment of the present disclosure
  • Fig. 5b shows an expanded schematic diagram of a recurrent neural network according to an embodiment of the present disclosure
  • FIG. 5c shows a schematic diagram of the cycle of a recurrent neural network according to an embodiment of the present disclosure
  • Fig. 6 shows a flowchart of a method for adjusting parameters of a recurrent neural network according to an embodiment of the present disclosure
  • FIG. 7 shows a flowchart of a method for determining the variation range of a point position in an embodiment of the present disclosure
  • FIG. 8 shows a flowchart of a method for determining a second average value in an embodiment of the present disclosure
  • FIG. 9 shows a flowchart of a data bit width adjustment method in an embodiment of the present disclosure.
  • FIG. 10 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • FIG. 11 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • FIG. 12 shows a flowchart of a data bit width adjustment method in still another embodiment of the present disclosure
  • FIG. 13 shows a flowchart of a method for determining a second average value in another embodiment of the present disclosure
  • FIG. 14 shows a flowchart of a method for adjusting a quantization parameter according to another embodiment of the present disclosure
  • FIG. 15 shows a flowchart of adjusting quantization parameters in a method for adjusting quantization parameters according to an embodiment of the present disclosure
  • FIG. 16 shows a flowchart of a method for determining a first target iteration interval in a parameter adjustment method according to another embodiment of the present disclosure
  • FIG. 17 shows a flowchart of a method for adjusting a quantization parameter according to still another embodiment of the present disclosure
  • FIG. 18 shows a structural block diagram of a quantization parameter adjustment device according to an embodiment of the present disclosure
  • Fig. 19 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the calculation data involved in the calculation process of the cyclic neural network can be quantified, that is, the calculation data represented by the floating point is converted into the calculation data represented by the fixed point, thereby reducing the storage capacity and memory access efficiency of the storage device, and improving the calculation The computing efficiency of the device.
  • the traditional quantization method uses the same data bit width and quantization parameters (such as the position of the decimal point) to quantify the different operation data of the recurrent neural network during the entire training process of the recurrent neural network.
  • quantization parameters such as the position of the decimal point
  • the present disclosure provides a quantization parameter adjustment method of a recurrent neural network, which can be applied to a quantization parameter adjustment device including a memory 110 and a processor 120.
  • a quantization parameter adjustment device including a memory 110 and a processor 120.
  • 1 is a structural block diagram of the quantization parameter adjustment device 100, wherein the processor 120 of the quantization parameter adjustment device 100 may be a general-purpose processor, and the processor 120 of the quantization parameter adjustment device 100 may also be an artificial intelligence processor.
  • the processor of the quantization parameter adjustment device 100 may also include a general-purpose processor and an artificial intelligence processor, which is not specifically limited here.
  • the memory 110 may be used to store operation data in a cyclic neural network operation process, and the operation data may be one or more of neuron data, weight data, or gradient data.
  • the memory 110 may also be used to store a computer program.
  • the computer program When the computer program is executed by the above-mentioned processor 120, it can implement the quantization parameter adjustment method in the embodiment of the present disclosure.
  • This method can be applied to the training or fine-tuning process of the recurrent neural network, and dynamically adjust the quantization parameters of the calculation data according to the distribution characteristics of the calculation data at different stages of the training or fine-tuning process of the recurrent neural network, thereby improving the performance of the recurrent neural network.
  • the accuracy of the quantification process ensures the accuracy and reliability of the calculation results.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the memory can be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive random access memory (RRAM), dynamic random access memory (DRAM), static Random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory) or hybrid storage cube HMC (Hybrid Memory Cube), etc.
  • RRAM resistive random access memory
  • DRAM dynamic random access memory
  • SRAM Static Random-Access Memory
  • EDRAM Enhanced Dynamic Random Access Memory
  • HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • quantization refers to converting operation data in a first data format into operation data in a second data format.
  • the arithmetic data in the first data format may be floating-point arithmetic data
  • the arithmetic data in the second data format may be a fixed-point arithmetic data. Since floating-point arithmetic data usually occupies a large storage space, by converting floating-point arithmetic data to fixed-point arithmetic data, storage space can be saved, and the efficiency of accessing the arithmetic data and computing efficiency can be improved.
  • the quantization parameter in the quantization process may include a point position and/or a scaling factor, where the point position refers to the position of the decimal point in the quantized operation data.
  • the scaling factor refers to the ratio between the maximum value of the quantized data and the maximum absolute value of the data to be quantized.
  • the quantization parameter may also include an offset.
  • the offset is for asymmetric data to be quantized, and refers to the intermediate value of multiple elements in the data to be quantized. Specifically, the offset may be the data to be quantized. The midpoint value of multiple elements in the data.
  • the quantization parameter may not include an offset. In this case, quantization parameters such as point positions and/or scaling coefficients can be determined according to the data to be quantized.
  • Figure 2 shows a schematic diagram of the correspondence between the data to be quantized and the quantized data according to an embodiment of the present disclosure.
  • the data to be quantized is symmetrical data with respect to the origin, assuming that Z 1 is the absolute value of the element in the data to be quantized.
  • the maximum value of, the data bit width corresponding to the data to be quantized is n
  • A is the maximum value that can be represented by the quantized data after quantizing the data to be quantized with the data bit width n
  • A is 2 s (2 n-1 -1)
  • A needs to include Z 1
  • Z 1 must be greater than Therefore, there are constraints of formula (1):
  • the processor can calculate the point position s according to the maximum absolute value Z1 and the data bit width n in the data to be quantized. For example, the following formula (2) can be used to calculate the point position s corresponding to the data to be quantified:
  • ceil is rounded up
  • Z 1 is the maximum absolute value in the data to be quantized
  • s is the point position
  • n is the data bit width.
  • the floating-point representation of the data to be quantized F x can be expressed as: F x ⁇ I x ⁇ 2 s , where I x refers to the quantized n-bit binary representation Value, s represents the point position.
  • the quantized data corresponding to the data to be quantized is:
  • s is the point position
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (3). It is understandable that, in the case of a certain data bit width, in the quantized data obtained by quantization according to the point position, the more digits after the decimal point, the greater the quantization accuracy of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may include a first zoom factor, and the first zoom factor may be calculated as follows:
  • Z1 is the maximum absolute value of the data to be quantized
  • A is the maximum value that can be represented by the quantized data of the data to be quantized by the data bit width n
  • A is 2 s (2 n-1 -1).
  • the processor can quantize the to-be-quantized data F x by combining the point position and the first scaling factor to obtain the quantized data:
  • s is the point position determined according to the above formula (2)
  • f 1 is the first scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero may be used to replace the rounding operation in formula (6).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may also include a second zoom factor, and the second zoom factor may be calculated as follows:
  • the processor may separately use the second scaling factor to quantize the to-be-quantized data F x to obtain the quantized data:
  • f 2 is the second scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (9). It is understandable that when the data bit width is constant, different scaling factors can be used to adjust the numerical range of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the above-mentioned second scaling factor may be determined according to the point position and the first scaling factor f 1 . That is, the second scaling factor can be calculated according to the following formula:
  • s is the point position determined according to the above formula (2)
  • f 1 is the first scaling factor calculated according to the above formula (5).
  • the quantization method of the embodiment of the present disclosure can not only realize the quantization of symmetric data, but also realize the quantization of asymmetric data.
  • the processor can convert asymmetric data into symmetric data to avoid data "overflow".
  • the quantization parameter may also include an offset, and the offset may be a midpoint value of the data to be quantized, and the offset may be used to indicate the offset of the midpoint value of the data to be quantized relative to the origin.
  • FIG. 3 shows a schematic diagram of the conversion of the data to be quantized according to an embodiment of the present disclosure. As shown in FIG.
  • the processor can perform statistics on the data distribution of the data to be quantized to obtain the minimum value Z min among all the elements in the data to be quantized, and After the maximum value Z max among all the elements in the data to be quantized, the processor may calculate the above-mentioned offset according to the minimum value Z min and the maximum value Z max.
  • the specific offset calculation method is as follows:
  • o represents the offset
  • Z min represents the minimum value among all the elements of the data to be quantized
  • Z max represents the maximum value among all the elements of the data to be quantized.
  • the processor may determine the maximum absolute value Z 2 in the data to be quantized according to the minimum value Z min and the maximum value Z max of all elements of the data to be quantized,
  • the processor can translate the data to be quantized according to the offset o, and convert the asymmetric data to be quantized into symmetric data to be quantized, as shown in FIG. 3.
  • the processor may further determine the point position s according to the maximum absolute value Z 2 in the data to be quantized, where the point position may be calculated according to the following formula:
  • ceil is rounded up
  • s is the point position
  • n is the data bit width
  • the processor can quantize the quantized data according to the offset and the corresponding point position to obtain the quantized data:
  • s is the point position determined according to the above formula (14)
  • o is the offset
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (15).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the processor may further determine the point position s and the first scaling factor f 1 according to the maximum absolute value Z 2 in the data to be quantized, where the specific calculation method of the point position s can be found in the above formula (14).
  • the first scaling factor f 1 can be calculated according to the following formula:
  • the processor may quantize the data to be quantized according to the offset and its corresponding first scaling factor f 1 and point position s to obtain the quantized data:
  • f 1 is the first scaling factor
  • s is the point position determined according to the above formula (14)
  • o is the offset
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation.
  • rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (18).
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F 1 is the first scaling factor
  • s is the point position determined according to the above formula (14)
  • o is the offset
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding.
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data
  • F x indicates a consistent format
  • the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the zoom factor may also include a second zoom factor, and the second zoom factor may be calculated as follows:
  • the processor may separately use the second scaling factor to quantize the to-be-quantized data F x to obtain the quantized data:
  • f 2 is the second scaling factor
  • I x is the quantized data
  • F x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding calculation methods can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operation in formula (21). It is understandable that when the data bit width is constant, different scaling factors can be used to adjust the numerical range of the quantized data.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the above-mentioned second scaling factor may be determined according to the position of the point and the first scaling factor f 1 . That is, the second scaling factor can be calculated according to the following formula:
  • s is the point position determined according to the above formula (14)
  • f 1 is the first scaling factor calculated according to the above formula (17).
  • the processor may also quantize the data to be quantized according to the offset o, at this time the point position s and/or the scaling factor may be preset values. At this time, the processor quantizes the quantized data according to the offset to obtain the quantized data:
  • o is the offset
  • I x is the quantized data
  • f x is the data to be quantized
  • round is the rounding operation performed by rounding. It is understandable that other rounding operations can also be used, for example, rounding operations such as rounding up, rounding down, and rounding to zero are used to replace the rounding operations in formula (24). It is understandable that when the data bit width is constant, using different offsets, the offset between the value of the quantized data and the data before the quantization can be adjusted.
  • intermediate representation data F x1 corresponding to the data to be quantized may be:
  • F x1 may be the above-described quantized data I x data inverse quantization obtained in the intermediate data representing F x1 data indicating the data format described above to be quantized data F x indicates a consistent format, the intermediate data representing F x1 may be For calculating the quantization error, see below for details.
  • inverse quantization refers to the inverse process of quantization.
  • the quantization operation of the present disclosure can be used not only for the quantization of the floating-point data described above, but also for realizing the quantization of fixed-point data.
  • the arithmetic data in the first data format may also be a fixed-point arithmetic data
  • the arithmetic data in the second data format may be a fixed-point arithmetic data
  • the arithmetic data in the second data format has a data representation range less than
  • the data in the first data format represents the range, and the number of decimal places in the second data format is greater than that in the first data format, that is, the operation data in the second data format has higher precision than the operation data in the first data format.
  • the arithmetic data in the first data format is fixed-point data occupying 16 bits
  • the second data format may be fixed-point data occupying 8 bits.
  • quantization processing can be performed by using fixed-point arithmetic data, thereby further reducing the storage space occupied by the arithmetic data, and improving the efficiency of accessing and storing the arithmetic data.
  • the method for adjusting the quantitative parameters of the recurrent neural network can be applied to the training or fine-tuning process of the recurrent neural network, so as to dynamically adjust the calculation during the operation of the recurrent neural network during the training or fine-tuning process of the recurrent neural network.
  • the quantization parameter of the data to improve the quantization accuracy of the cyclic neural network.
  • the recurrent neural network may be a deep recurrent neural network or a convolutional recurrent neural network, etc., which is not specifically limited here.
  • an iterative operation generally includes a forward operation, a reverse operation, and a weight update operation.
  • Forward operation refers to the process of forward inference based on the input data of the recurrent neural network to obtain the result of the forward operation.
  • the reverse operation is a process of determining the loss value according to the result of the forward operation and the preset reference value, and determining the weight gradient value and/or the gradient value of the input data according to the loss value.
  • the weight update operation refers to the process of adjusting the weight of the recurrent neural network according to the gradient of the weight.
  • the training process of the recurrent neural network is as follows: the processor may use the recurrent neural network with a weight value of a random number to perform a forward operation on the input data to obtain a forward operation result. The processor then determines the loss value according to the forward operation result and the preset reference value, and determines the weight gradient value and/or the input data gradient value according to the loss value. Finally, the processor can update the gradient value of the recurrent neural network according to the weight gradient value, obtain a new weight value, and complete an iterative operation.
  • the processor cyclically executes multiple iterative operations until the forward operation result of the cyclic neural network meets the preset condition. For example, when the forward operation result of the recurrent neural network converges to the preset reference value, the training ends. Or, when the loss value determined by the forward operation result of the recurrent neural network and the preset reference value is less than or equal to the preset accuracy, the training ends.
  • Fine-tuning refers to the process of performing multiple iterative operations on the cyclic neural network (the weight of the cyclic neural network is already in a convergent state rather than a random number), so that the accuracy of the cyclic neural network can meet the preset requirements.
  • This fine-tuning process is basically the same as the above-mentioned training process, and can be regarded as a process of retraining the recurrent neural network in a convergent state.
  • Inference refers to the process of using cyclic neural networks whose weights meet preset conditions to perform forward operations to realize functions such as recognition or classification, such as the use of cyclic neural networks for image recognition and so on.
  • FIG. 4 shows a flowchart of a method for adjusting a quantization parameter of a recurrent neural network according to an embodiment of the present disclosure. As shown in FIG. 4, the above method may include step S100 to step S200.
  • step S100 the data variation range of the data to be quantized is obtained.
  • the processor may directly read the data variation range of the data to be quantized, and the data variation range of the data to be quantized may be input by the user.
  • the processor may also calculate the data change range of the data to be quantified according to the data to be quantified in the current inspection iteration and the data to be quantified in the historical iteration, where the current inspection iteration refers to the iterative operation currently performed, and the historical iteration Refers to the iterative operation performed before the current inspection iteration.
  • the processor can obtain the maximum value of the elements in the data to be quantified and the average value of the elements in the current inspection iteration, as well as the maximum value of the elements in the data to be quantified and the average value of the elements in each historical iteration, and calculate the value according to the elements in each iteration.
  • the maximum value of and the average value of the elements determine the range of change of the data to be quantified.
  • the data variation range of the data to be quantified can be represented by the moving average or variance of the data to be quantified, which is not specifically limited here.
  • the data variation range of the data to be quantized can be used to determine whether the quantization parameter of the data to be quantized needs to be adjusted. For example, if the data to be quantized has a large data variation range, it can indicate that the quantization parameters need to be adjusted in time to ensure the quantization accuracy. If the data change range of the data to be quantified is small, the quantization parameter of the historical iteration can be used for the current inspection iteration and a certain number of iterations thereafter, thereby avoiding frequent adjustment of the quantization parameter and improving quantization efficiency.
  • each iteration involves at least one to-be-quantized data
  • the to-be-quantized data may be arithmetic data represented by a floating point or a fixed-point arithmetic data.
  • the data to be quantified in each iteration may be at least one of neuron data, weight data, or gradient data
  • the gradient data may also include neuron gradient data, weight gradient data, and the like.
  • a first target iteration interval is determined according to the data variation range of the data to be quantized, so as to adjust the quantization parameter in the cyclic neural network operation according to the first target iteration interval, wherein the first The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the quantization parameter may include the above-mentioned point position and/or zoom factor, wherein the zoom factor may include a first zoom factor and a second zoom factor.
  • the specific point position calculation method can refer to the above formula (2)
  • the calculation method of the scaling factor can refer to the above formula (5) or (8), which will not be repeated here.
  • the quantization parameter may also include an offset.
  • the processor may also determine the point position according to formula (14), according to the above formula (14).
  • the formula (17) or (20) determines the scaling factor.
  • the processor may update at least one of the above-mentioned point position, scaling factor, or offset according to the determined target iteration interval to adjust the quantization parameter in the cyclic neural network operation.
  • the quantization parameter in the cyclic neural network operation can be updated according to the data variation range of the data to be quantized in the cyclic neural network operation, so that the quantization accuracy can be guaranteed.
  • the data change curve of the data to be quantified can be obtained by performing statistics and analysis on the change trend of the calculation data during the training or fine-tuning process of the recurrent neural network.
  • Fig. 5a shows the variation trend diagram of the data to be quantified in the calculation process according to an embodiment of the present disclosure.
  • the data variation curve it can be known that in the initial stage of cyclic neural network training or fine-tuning, different iterations are waiting The data changes of the quantized data are relatively drastic. With the progress of training or fine-tuning operations, the data changes of the data to be quantified in different iterations gradually tend to be flat.
  • the quantization parameters can be adjusted more frequently; in the middle and late stages of cyclic neural network training or fine-tuning, the quantization parameters can be adjusted at intervals of multiple iterations or cycles.
  • the method of the present disclosure is to determine a suitable iteration interval to achieve a balance between quantization accuracy and quantization efficiency.
  • the processor may determine the first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the cyclic neural network operation according to the first target iteration interval.
  • the first target iteration interval may increase as the data variation range of the data to be quantized decreases. That is to say, when the data fluctuation range of the data to be quantized is larger, the first target iteration interval is smaller, which indicates that the adjustment of the quantization parameter is more frequent.
  • the smaller the data variation range of the data to be quantified the larger the first target iteration interval, which indicates that the adjustment of the quantization parameter is less frequent.
  • the above-mentioned first target iteration interval may also be a hyperparameter.
  • the first target iteration interval may be customized by a user.
  • the various data to be quantified may have different iteration intervals.
  • the processor may obtain the data variation amplitudes corresponding to various data to be quantized, so as to determine the first target iteration interval corresponding to the corresponding data to be quantized according to the data variation amplitudes of each type of data to be quantized.
  • the quantization process of various data to be quantized can be performed asynchronously.
  • different data fluctuation ranges of the data to be quantified can be used to determine the corresponding first target iteration interval, and iterate according to the corresponding first target.
  • the interval determines the corresponding quantization parameter, so that the quantization accuracy of the data to be quantized can be ensured, and the correctness of the calculation result of the recurrent neural network can be ensured.
  • the same target iteration interval (including any one of the first target iteration interval, the preset iteration interval, and the second target iteration interval) may also be determined for different types of data to be quantified, so as to be based on the The target iteration interval adjusts the quantization parameter corresponding to the data to be quantized.
  • the processor may obtain the data variation amplitudes of various data to be quantized, and determine the target iteration interval according to the largest data variation amplitude of the data to be quantized, and determine the quantization parameters of various data to be quantized according to the target iteration interval.
  • different types of data to be quantized can also use the same quantization parameter.
  • the aforementioned cyclic neural network may include at least one arithmetic layer, and the data to be quantified may be at least one of neuron data, weight data, or gradient data involved in each arithmetic layer.
  • the processor can obtain the data to be quantized related to the current arithmetic layer, and determine the data variation range of various data to be quantized in the current arithmetic layer and the corresponding first target iteration interval according to the foregoing method.
  • the processor may determine the aforementioned data variation range of the data to be quantized once in each iteration operation process, and determine a first target iteration interval according to the data variation range of the corresponding data to be quantized.
  • the processor may calculate the first target iteration interval once in each iteration.
  • the processor may select the inspection iteration from each iteration according to preset conditions, determine the variation range of the data to be quantified at each inspection iteration, and determine the quantization parameters and the like according to the first target iteration interval corresponding to the inspection iteration. Update adjustments. At this time, if the iteration is not the selected inspection iteration, the processor may ignore the first target iteration interval corresponding to the iteration.
  • each target iteration interval may correspond to a verification iteration
  • the verification iteration may be the initial iteration of the target iteration interval or the end iteration of the target iteration interval.
  • the processor can adjust the quantization parameter of the cyclic neural network at the inspection iteration of each target iteration interval, so as to adjust the quantization parameter of the cyclic neural network operation according to the target iteration interval.
  • the verification iteration may be a time point for verifying whether the current quantization parameter meets the requirements of the data to be quantified.
  • the quantization parameter before adjustment may be the same as the quantization parameter after adjustment, or may be different from the quantization parameter after adjustment.
  • the interval between adjacent inspection iterations may be greater than or equal to a target iteration interval.
  • the target iteration interval may calculate the number of iterations from the current inspection iteration, and the current inspection iteration may be the starting iteration of the target iteration interval. For example, if the current inspection iteration is the 100th iteration, the processor determines that the target iteration interval is 3 according to the data change range of the data to be quantified, and the processor can determine that the target iteration interval includes 3 iterations, which are respectively the 100th iteration. Second iteration, 101st iteration, and 102nd iteration. The processor can adjust the quantization parameter in the recurrent neural network operation at the 100th iteration. Among them, the current inspection iteration is the corresponding iterative operation when the processor currently performs the update and adjustment of the quantization parameter.
  • the target iteration interval may also be the number of iterations calculated from the next iteration of the current inspection iteration, and the current inspection iteration may be the termination iteration of the previous iteration interval before the current inspection iteration.
  • the processor determines that the target iteration interval is 3 according to the data change range of the data to be quantified, and the processor can determine that the target iteration interval includes 3 iterations, which are the 101st.
  • the second iteration, the 102nd iteration, and the 103rd iteration can adjust the quantization parameter in the recurrent neural network operation at the 100th iteration and the 103rd iteration.
  • the present disclosure does not specifically limit the method for determining the target iteration interval.
  • Fig. 5b shows an expanded schematic diagram of a recurrent neural network according to an embodiment of the present disclosure.
  • a schematic diagram of the unfolding of the hidden layer of the cyclic neural network is given, and t-1, t, t+1 represent the time series.
  • X represents the input sample.
  • W represents the weight of the input
  • U represents the weight of the input sample at the moment
  • V represents the weight of the output sample. Due to the different number of layers unfolded by different cyclic neural networks, the total number of iterations contained in different cycles is different when the quantized parameter update of the cyclic neural network is different.
  • FIG. 5c shows a schematic diagram of a cycle of a recurrent neural network according to an embodiment of the present disclosure.
  • iter 1 , iter 2 , iter 3 , and iter 4 are the three cycles of the cyclic neural network.
  • the first cycle iter 1 includes four iterations of t 0 , t 1 , t 2 , and t 3 .
  • the second cycle iter 2 includes two iterations t 0 and t 1.
  • the third cycle iter 3 includes three iterations t 0 , t 1 , and t 2.
  • the fourth cycle iter 2 includes five iterations of t 0 , t 1 , t 2 , t 3 , and t 4.
  • FIG. 6 shows a flowchart of a method for adjusting parameters of a recurrent neural network according to an embodiment of the present disclosure.
  • the above-mentioned operation S100 may include operation S110, and operation S200 may include operation S210 (see below for details) .
  • the variation range of the point position can indirectly reflect the variation range of the data to be quantified.
  • the variation range of the point position may be determined according to the point position of the current inspection iteration and the point position of at least one historical iteration. Among them, the point position of the current test iteration and the point position of each historical iteration can be determined according to formula (2). Of course, the point position of the current test iteration and the point position of each historical iteration can also be determined according to formula (14).
  • the processor may also calculate the variance of the point position of the current test iteration and the point position of the historical iteration, and determine the variation range of the point position according to the variance.
  • the processor may determine the variation range of the point position according to the average value of the point position of the current inspection iteration and the point position of the historical iteration.
  • the above-mentioned operation S110 may include operation S111 to operation S113, and operation S210 may include operation S211 (see the following description for details).
  • S111 Determine a first average value according to the point position corresponding to the previous inspection iteration before the current inspection iteration and the point position corresponding to the historical iteration before the previous inspection iteration.
  • the previous inspection iteration is the iteration corresponding to the last time the quantization parameter is adjusted, and there is at least one iteration interval between the previous inspection iteration and the current inspection iteration.
  • At least one historical iteration may belong to at least one iteration interval, each iteration interval may correspond to one inspection iteration, and two adjacent inspection iterations may have one iteration interval.
  • the previous inspection iteration in the foregoing operation S111 may be the inspection iteration corresponding to the previous iteration interval before the target iteration interval.
  • the first average value can be calculated according to the following formula:
  • a1 ⁇ am refer to the calculated weights corresponding to the point positions of each iteration
  • s t-1 refers to the point positions corresponding to the previous test iteration
  • s t-2 refers to the point positions corresponding to the previous test iteration
  • s t-3 refers to the previous test
  • M1 refers to the above-mentioned first mean value.
  • the last test iteration is the 100th iteration of the cyclic neural network operation
  • the historical iteration can be from the 1st iteration to the 99th iteration
  • the processor can obtain the point position of the 100th iteration (that is, st-1 ), and obtain the point position of the historical iteration before the 100th iteration, that is, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network...
  • st-3 can refer to the 98th time of the cyclic neural network
  • the point position corresponding to the iteration, st-2 can refer to the point position corresponding to the 99th iteration of the recurrent neural network.
  • the processor may calculate the first average value according to the above formula.
  • the first average value may be calculated according to the point positions of the inspection iterations corresponding to each iteration interval.
  • the first average value can be calculated according to the following formula:
  • M1 a1 ⁇ s t-1 +a2 ⁇ s t-2 +a3 ⁇ s t-3 +...+am ⁇ s 1 ;
  • a1 ⁇ am refers to the calculated weights corresponding to the point positions of each inspection iteration
  • s t-1 refers to the point positions corresponding to the previous inspection iteration
  • s t-2 refers to the point positions corresponding to the previous inspection iteration
  • s t-3 refers to the previous
  • M1 refers to the above-mentioned first mean value.
  • the last test iteration is the 100th iteration of the cyclic neural network operation
  • the historical iteration can be from the 1st iteration to the 99th iteration
  • the 99 historical iterations can be divided into 11 iteration intervals.
  • the 1st iteration to the 9th iteration belong to the 1st iteration interval
  • the 10th iteration to the 18th iteration belong to the 2nd iteration interval
  • the 90th iteration to the 99th iteration belong to the 11th iteration Iteration interval.
  • the processor can obtain the point position of the 100th iteration (ie s t-1 ), and obtain the point position of the test iteration in the iteration interval before the 100th iteration, that is, s 1 can refer to the first of the recurrent neural network.
  • the point position corresponding to the inspection iteration of the iteration interval (for example, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network), ..., st-3 can refer to the inspection of the 10th iteration interval of the cyclic neural network
  • the point position corresponding to the iteration (for example, st-3 can refer to the point position corresponding to the 81st iteration of the cyclic neural network)
  • st-2 can refer to the point position corresponding to the inspection iteration of the eleventh iteration interval of the cyclic neural network ( For example, st-2 can refer to the point position corresponding to the 90th iteration of the recurrent neural network).
  • the processor may calculate the first average value M1 according to the above formula.
  • the iteration interval includes the same number of iterations.
  • the number of iterations included in the iteration interval of the cyclic neural network is not the same.
  • the number of iterations included in the iteration interval increases with the increase of iterations, that is, as the training or fine-tuning of the cyclic neural network proceeds, the iteration interval may become larger and larger.
  • the above-mentioned first mean value M1 can be calculated by using the following formula:
  • refers to the calculated weight of the point position corresponding to the previous inspection iteration
  • s t-1 refers to the point position corresponding to the previous inspection iteration
  • M0 refers to the moving average corresponding to the inspection iteration before the previous inspection iteration.
  • S112 Determine a second average value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration.
  • the point position corresponding to the current inspection iteration can be determined according to the target data bit width of the current inspection iteration and the data to be quantified.
  • the second mean value M2 can be calculated according to the following formula:
  • b1 ⁇ bm refer to the calculation weights corresponding to the point positions of each iteration
  • s t refers to the point positions corresponding to the current inspection iteration
  • s t-1 refers to the history before the current inspection iteration
  • M2 refers to the second mean value mentioned above.
  • the current inspection iteration is the 101st iteration of the cyclic neural network operation
  • the historical iteration before the current inspection iteration refers to the 1st iteration to the 100th iteration.
  • the processor can obtain the point position of the 101st iteration (ie s t ), and obtain the point position of the historical iteration before the 101st iteration, that is, s 1 can refer to the point corresponding to the first iteration of the recurrent neural network Position...
  • st-2 can refer to the point position corresponding to the 99th iteration of the cyclic neural network
  • st-1 can refer to the point position corresponding to the 100th iteration of the cyclic neural network.
  • the processor may calculate the second average value M2 according to the above formula.
  • the second average value may be calculated according to the point position of the inspection iteration corresponding to each iteration interval.
  • FIG. 8 shows a flowchart of a method for determining a second average value in an embodiment of the present disclosure.
  • the foregoing operation S112 may include the following operations:
  • the second average value can be calculated according to the following formula:
  • M2 b1 ⁇ s t + b2 ⁇ s t-1 + b3 ⁇ s t-2 +...+bm ⁇ s 1 ;
  • b1 ⁇ bm refer to the calculated weights corresponding to the point positions of each iteration
  • s t refers to the point positions corresponding to the current inspection iteration
  • s t-1 refers to the point positions corresponding to the current inspection iteration
  • s t-2 refers to the inspection before the current inspection iteration
  • M2 refers to the second mean value mentioned above.
  • the current inspection iteration is the 100th iteration
  • the historical iteration may be from the 1st iteration to the 99th iteration
  • the 99 historical iterations may be divided into 11 iteration intervals.
  • the 1st iteration to the 9th iteration belong to the 1st iteration interval
  • the 10th iteration to the 18th iteration belong to the 2nd iteration interval
  • the 90th iteration to the 99th iteration belong to the 11th iteration Iteration interval.
  • the processor can obtain the point position of the 100th iteration (ie s t ), and obtain the point position of the test iteration in the iteration interval before the 100th iteration, that is, s 1 can refer to the first iteration of the recurrent neural network
  • the point position corresponding to the interval test iteration for example, s 1 can refer to the point position corresponding to the first iteration of the cyclic neural network
  • st-2 can refer to the test iteration corresponding to the 10th iteration interval of the cyclic neural network
  • st-1 can refer to the point position corresponding to the inspection iteration of the 11th iteration interval of the recurrent neural network (for example, s t-1 can refer to the point position corresponding to the 90th iteration of the recurrent neural network).
  • the processor may calculate the second average value M2 according
  • the iteration interval includes the same number of iterations.
  • the number of iterations contained in the iteration interval may be different.
  • the number of iterations included in the iteration interval increases with the increase of iterations, that is, as the training or fine-tuning of the cyclic neural network proceeds, the iteration interval may become larger and larger.
  • the processor may determine the second average value according to the point position corresponding to the current inspection iteration and the first average value, that is, the second average value may be as follows The formula is calculated:
  • refers to the calculated weight of the point position corresponding to the current inspection iteration
  • M1 refers to the above-mentioned first mean value
  • the first error may be equal to the absolute value of the difference between the second average value and the aforementioned first average value.
  • the above-mentioned first error can be calculated according to the following formula:
  • the above-mentioned point position of the current inspection iteration can be determined according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration.
  • the target data bit width corresponding to the current inspection iteration described above may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration may be user-defined input.
  • the data bit width corresponding to the data to be quantized in the training or fine-tuning process of the recurrent neural network may be constant, that is, the same type of data to be quantized in the same recurrent neural network is quantized using the same data bit width, for example, for The neuron data in each iteration of the cyclic neural network is quantized with a data width of 8 bits.
  • the data bit width corresponding to the data to be quantized in the training or fine-tuning process of the recurrent neural network is variable to ensure that the data bit width can meet the quantization requirements of the data to be quantized. That is, the processor can adaptively adjust the data bit width corresponding to the data to be quantized according to the data to be quantized, and obtain the target data bit width corresponding to the data to be quantized. Specifically, the processor may first determine the target data bit width corresponding to the current inspection iteration, and then, the processor may determine the current inspection iteration corresponding to the target data bit width corresponding to the current inspection iteration and the data to be quantified corresponding to the current inspection iteration. Point location.
  • FIG. 9 shows a flowchart of a data bit width adjustment method in an embodiment of the present disclosure.
  • the above operation S110 may include:
  • the foregoing processor may use the initial data bit width to quantize the data to be quantized to obtain the foregoing quantized data.
  • the initial data bit width of the current inspection iteration may be a hyperparameter, and the initial data bit width of the current inspection iteration may also be determined based on the data to be quantified of the previous inspection iteration before the current inspection iteration.
  • the processor may determine the intermediate representation data according to the to-be-quantized data of the current inspection iteration and the quantized data of the current inspection iteration.
  • the intermediate representation data is consistent with the aforementioned representation format of the data to be quantized.
  • the processor may perform inverse quantization on the aforementioned quantized data to obtain intermediate representation data consistent with the representation format of the data to be quantized, where inverse quantization refers to the inverse process of quantization.
  • the quantized data can be obtained using the above formula (3), and the processor can also dequantize the quantized data according to the above formula (4) to obtain the corresponding intermediate representation data, and determine the quantization error according to the data to be quantized and the intermediate representation data .
  • the processor may calculate the quantization error according to the data to be quantized and the corresponding intermediate representation data.
  • the processor may determine an error term according to the to-be-quantized data F x and its corresponding intermediate representation data F x1 , and determine the quantization error according to the error term.
  • the processor may determine the aforementioned error term according to the sum of the elements in the intermediate representation data F x1 and the sum of the elements in the to-be-quantized data F x .
  • the error term may be the sum of the elements in the intermediate representation data F x1.
  • the processor can determine the quantization error according to the error term.
  • the specific quantization error can be determined according to the following formula:
  • z i is an element in the data to be quantized
  • z i (n) is an element in the middle representing data F x1.
  • the processor may calculate data of the elements to be quantized with the intermediate data F x1 represents the difference in the respective elements, a difference value of m, and m, and the error term as the difference values. After that, the processor can determine the quantization error according to the error term.
  • the specific quantization error can be determined according to the following formula:
  • z i is an element in the data to be quantized
  • z i (n) is an element in the middle representing data F x1.
  • the difference between each element in the data to be quantized and the corresponding element in the intermediate representation data F x1 may be approximately equal to 2 s-1 . Therefore, the quantization error may also be determined according to the following formula:
  • m is the number of intermediate representation data F x1 corresponding to the target data
  • s is the point position
  • z i is the element in the data to be quantized.
  • the intermediate representation data can also be consistent with the data representation format of the aforementioned quantized data, and the quantization error is determined based on the intermediate representation data and the quantized data.
  • the data to be quantified can be expressed as: F x ⁇ I x ⁇ 2 s , then the intermediate representation data can be determined
  • the intermediate representation data I x1 may have the same data representation format as the aforementioned quantized data.
  • the processor can calculate according to the intermediate representation data I x1 and the above formula (3). Determine the quantization error.
  • the specific quantization error determination method can refer to the above formula (31) to formula (33).
  • the processor may adaptively adjust the data bit width corresponding to the current inspection iteration according to the quantization error, and determine the target data bit width adjusted by the current inspection iteration.
  • the quantization error satisfies the preset condition
  • the data bit width corresponding to the current inspection iteration can be kept unchanged, that is, the target data bit width of the current inspection iteration can be equal to the initial data bit width.
  • the processor can adjust the data bit width corresponding to the data to be quantized in the current inspection iteration to obtain the target data bit width corresponding to the current inspection iteration.
  • the processor uses the target data bit width to quantize the data to be quantized in the current inspection iteration, the quantization error satisfies the aforementioned preset condition.
  • the aforementioned preset condition may be a preset threshold set by the user.
  • FIG. 10 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • the foregoing operation S115 may include:
  • the processor may determine whether the aforementioned quantization error is greater than or equal to a first preset threshold.
  • operation S1151 may be performed to increase the data bit width corresponding to the current inspection iteration to obtain the target data bit width of the current inspection iteration.
  • the quantization error is less than the first preset threshold, the data bit width of the current inspection iteration can be kept unchanged.
  • the processor may obtain the above-mentioned target data bit width after one adjustment.
  • the initial data bit width of the current inspection iteration is n1
  • the target data bit width n2 is used to quantize the to-be-quantized data of the current inspection iteration, the obtained quantization error may be less than the first preset threshold.
  • the processor may obtain the target data bit width through multiple adjustments until the quantization error is less than the first preset threshold, and use the data bit width when the quantization error is less than the first preset threshold as the target data bit width. Specifically, if the quantization error is greater than or equal to the first preset threshold, the first intermediate data bit width is determined according to the first preset bit width step; then, the processor can perform the current check according to the first intermediate data bit width.
  • the iterative data to be quantized is quantized to obtain quantized data, and a quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold .
  • the processor may use the corresponding data bit width when the quantization error is less than the first preset threshold value as the target data bit width.
  • the initial data bit width of the current inspection iteration is n1
  • the processor can use the initial data bit width n1 to quantize the data to be quantized A of the current inspection iteration to obtain the quantized data B1, and according to the data to be quantized A and the quantized data B1 is calculated to obtain the quantization error C1.
  • the aforementioned first preset bit width step length may be a constant value. For example, whenever the quantization error is greater than the first preset threshold, the processor may increase the data bit width corresponding to the current inspection iteration by the same value. Bit width value.
  • the aforementioned first preset bit width step size may also be a variable value. For example, the processor may calculate the difference between the quantization error and the first preset threshold, and if the quantization error is greater than the first preset threshold The smaller the difference, the smaller the value of the first preset bit width step.
  • FIG. 11 shows a flowchart of a method for adjusting a data bit width in another embodiment of the present disclosure.
  • the foregoing operation S115 may further include:
  • the processor may determine whether the aforementioned quantization error is less than or equal to a first preset threshold.
  • operation S1153 may be performed to reduce the data bit width corresponding to the current inspection iteration to obtain the target data bit width of the current inspection iteration.
  • the quantization error is greater than the second preset threshold, the data bit width of the current inspection iteration can be kept unchanged.
  • the processor may obtain the above-mentioned target data bit width after one adjustment.
  • the initial data bit width of the current inspection iteration is n1
  • the target data bit width n2 is used to quantize the to-be-quantized data of the current inspection iteration, the obtained quantization error may be greater than the second preset threshold.
  • the processor may obtain the target data bit width through multiple adjustments until the quantization error is greater than a second preset threshold, and use the data bit width when the quantization error is greater than the second preset threshold as the target data bit width. Specifically, if the quantization error is less than or equal to the first preset threshold, the second intermediate data bit width is determined according to the second preset bit width step; then the processor can check the current check according to the second intermediate data bit width.
  • the iterative data to be quantized is quantized to obtain quantized data, and a quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is greater than the second preset threshold .
  • the processor may use the corresponding data bit width when the quantization error is greater than the second preset threshold value as the target data bit width.
  • the initial data bit width of the current inspection iteration is n1
  • the processor can use the initial data bit width n1 to quantize the data to be quantized A of the current inspection iteration to obtain the quantized data B1, and according to the data to be quantized A and the quantized data B1 is calculated to obtain the quantization error C1.
  • the aforementioned second preset bit width step length may be a constant value. For example, whenever the quantization error is less than the second preset threshold value, the processor may reduce the data bit width corresponding to the current inspection iteration by the same value. Bit width value.
  • the aforementioned second preset bit width step size may also be a variable value. For example, the processor may calculate the difference between the quantization error and the second preset threshold, if the quantization error is greater than the second preset threshold The smaller the difference, the smaller the value of the second preset bit width step.
  • FIG. 12 shows a flowchart of a data bit width adjustment method in another embodiment of the present disclosure.
  • the processor determines that the quantization error is less than the first preset threshold, and the quantization error is greater than the second preset
  • the threshold is set
  • the data bit width of the current inspection iteration can be kept unchanged, where the first preset threshold is greater than the second preset threshold. That is, the target data bit width of the current inspection iteration can be equal to the initial data bit width.
  • FIG. 12 only illustrates the data bit width determination method of an embodiment of the present disclosure by way of example, and the sequence of each operation in FIG. 12 can be adjusted adaptively, which is not specifically limited here.
  • FIG. 13 shows a flowchart of a method for determining a second average value in another embodiment of the present disclosure. As shown in FIG. 13, the above method may further include:
  • the processor may reduce the second average value accordingly. If the data bit width adjustment value is less than the preset parameter (for example, the preset parameter may be equal to zero), that is, when the data bit width of the current inspection iteration decreases, the processor may increase the second average value accordingly.
  • the processor may not update the second average value, that is, the processor may not perform the above operation S117.
  • the updated second average value M2 ⁇ (s t - ⁇ n)+(1- ⁇ ) ⁇ (M1- ⁇ n).
  • the updated second mean value M2 ⁇ (s t - ⁇ n)+ (1- ⁇ ) ⁇ (M1+ ⁇ n), where st refers to the point position determined according to the target data bit width in the current inspection iteration.
  • the updated second average value M2 ⁇ st +(1- ⁇ ) ⁇ M1- ⁇ n.
  • the updated second mean value M2 ⁇ st +( 1- ⁇ ) ⁇ M1+ ⁇ n, where st refers to the point position determined by the current inspection iteration according to the target data bit width.
  • the foregoing operation S200 may include:
  • S210 Determine a first target iteration interval according to the variation amplitude of the point position, where the first target iteration interval is negatively related to the aforementioned variation amplitude of the point position. That is, the greater the variation range of the above-mentioned point position, the smaller the first target iteration interval. The smaller the fluctuation range of the above-mentioned point position, the larger the first target iteration interval.
  • the above-mentioned operation S210 may include:
  • the processor may determine the first target iteration interval according to the first error, where the first target iteration interval is negatively related to the first error. That is, the larger the first error, the larger the variation range of the point position, which in turn indicates that the data variation range of the data to be quantized is larger. At this time, the first target iteration interval is smaller.
  • the processor may calculate the first target iteration interval I according to the following formula:
  • I is the first target iteration interval
  • diff update1 represents the above-mentioned first error
  • ⁇ and ⁇ may be hyperparameters.
  • the first error can be used to measure the variation range of the point position.
  • the first target iteration interval is determined by calculating the variation range (first error) of the point position, and according to the variation range of the point position. Since the quantization parameter is determined according to the first target iteration interval, the quantized data obtained by quantization according to the quantization parameter can be more in line with the change trend of the point position of the target data, while ensuring the quantization accuracy, improving the operating efficiency of the recurrent neural network .
  • the processor may further determine the quantization parameters and data bit widths corresponding to the first target iteration interval at the current inspection iteration, so as to be based on the first target iteration interval. Update the quantization parameters.
  • the quantization parameter may include a point position and/or a scaling factor. Further, the quantization parameter may also include an offset.
  • FIG. 14 shows a flowchart of a quantization parameter adjustment method according to another embodiment of the present disclosure. As shown in FIG. 14, the above method may further include:
  • the processor adjusts the quantization parameter in the cyclic neural network operation according to the first target iteration interval.
  • the processor may determine update iterations (also called inspection iterations) according to the first target iteration interval and the total number of iterations in each cycle, and update the first target iteration interval at each update iteration, and may also update it at each update iteration.
  • Update the quantization parameter at any time For example, the data bit width in the cyclic neural network operation remains unchanged.
  • the processor can directly adjust the quantization parameters such as the point position according to the to-be-quantized data of the update iteration at each update iteration.
  • the data bit width in the cyclic neural network operation is variable.
  • the processor can update the data bit width at each update iteration, and adjust the point according to the updated data bit width and the data to be quantized in the update iteration. Quantitative parameters such as position.
  • the processor updates the quantization parameter at each inspection iteration to ensure that the current quantization parameter meets the quantization requirement of the data to be quantized.
  • the first target iteration interval before the update and the first target iteration interval after the update may be the same or different.
  • the data bit width before the update and the data bit width after the update can be the same or different; that is, the data bit width of different iteration intervals can be the same or different.
  • the quantization parameter before the update and the quantization parameter after the update may be the same or different; that is, the quantization parameters of different iteration intervals may be the same or different.
  • the processor may determine the quantization parameter in the first target iteration interval at the update iteration, so as to adjust the quantization parameter in the recurrent neural network operation.
  • operation S200 when the method is used in the training or fine-tuning process of the recurrent neural network, operation S200 may include:
  • the processor determines whether the current inspection iteration is greater than the first preset iteration, wherein, when the current inspection iteration is greater than the first preset iteration, the first target iteration interval is determined according to the data variation range of the data to be quantified. When the current inspection iteration is less than or equal to the first preset iteration, the quantization parameter is adjusted according to the preset iteration interval.
  • the current inspection iteration refers to the iterative operation currently performed by the processor.
  • the first preset iteration may be a hyperparameter, the first preset iteration may be determined according to a data variation curve of the data to be quantified, and the first preset iteration may also be set by a user.
  • the first preset iteration may be less than the total number of iterations included in one epoch, where one cycle means that all data to be quantized in the data set complete one forward operation and one reverse operation.
  • the processor may read the first preset iteration input by the user, and determine the preset iteration interval according to the correspondence between the first preset iteration and the preset iteration interval.
  • the preset iteration interval may be a hyperparameter, and the preset iteration interval may also be set by a user.
  • the processor can directly read the first preset iteration and the preset iteration interval input by the user, and update the quantization parameter in the cyclic neural network operation according to the preset iteration interval.
  • the processor does not need to determine the target iteration interval according to the data variation range of the data to be quantified.
  • the quantization parameter can be updated according to the preset iteration interval. That is, the processor can determine to update the quantization parameter every 5 iterations from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network. Specifically, the processor may determine the quantization parameters such as the data bit width n1 and the point position s1 corresponding to the first iteration, and use the data bit width n1 and the point position s1 and other quantization parameters of the data from the first iteration to the fifth iteration.
  • the data to be quantized is quantized, that is, the same quantization parameter can be used from the first iteration to the fifth iteration.
  • the processor can determine the quantization parameters such as the data bit width n2 and the point position s2 corresponding to the 6th iteration, and use the data bit width n2 and the point position s2 to determine the waiting time from the 6th iteration to the 10th iteration.
  • the quantized data is quantized, that is, the same quantization parameter can be used from the 6th iteration to the 10th iteration.
  • the processor can follow the above-mentioned quantization method until the 100th iteration is completed.
  • the method for determining the quantization parameters such as the data bit width and the point position in each iteration interval can be referred to the above description, and will not be repeated here.
  • the first preset iteration input by the user is the 100th iteration
  • the preset iteration interval is 1, and when the current inspection iteration is less than or equal to the 100th iteration, the quantization parameter can be updated according to the preset iteration interval. That is, the processor can determine to update the quantization parameter in each iteration from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network. Specifically, the processor may determine quantization parameters such as the data bit width n1 and point position s1 corresponding to the first iteration, and use the data bit width n1 and point position s1 to quantize the data to be quantized in the first iteration .
  • the processor can determine the quantization parameters such as the data bit width n2 and the point position s2 corresponding to the second iteration, and use the data bit width n2 and the point position s2 to quantize the data to be quantized in the second iteration. .
  • the processor can determine the quantization parameters such as the data bit width n100 and the point position s100 of the 100th iteration, and use the data bit width n100 and the point position s100 to quantize the data to be quantized in the 100th iteration .
  • the method for determining the quantization parameters such as the data bit width and the point position in each iteration interval can be referred to the above description, and will not be repeated here.
  • the processor may also determine the iteration interval of the point position according to the variation range of the point position , And update the quantization parameters such as the point position according to the iterative interval of the point position.
  • the current inspection iteration when the current inspection iteration is greater than the first preset iteration, it can indicate that the training or fine-tuning of the recurrent neural network is in the mid-stage.
  • the data variation range of the data to be quantified in the historical iteration can be obtained, and the data to be quantified can be obtained according
  • the magnitude of the data change determines the first target iteration interval, and the first target iteration interval may be greater than the above-mentioned preset iteration interval, thereby reducing the number of times of updating the quantization parameter and improving the quantization efficiency and computing efficiency.
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • the first preset iteration input by the user is the 100th iteration
  • the preset iteration interval is 1.
  • the quantization parameter can be updated according to the preset iteration interval. That is, the processor may determine to update the quantization parameter in each iteration from the first iteration to the 100th iteration of the training or fine-tuning of the recurrent neural network, and the specific implementation manner can be referred to the above description.
  • the processor can determine the data change range of the data to be quantified according to the data to be quantified in the current inspection iteration and the data to be quantified in the previous historical iteration, and based on the data to be quantified The magnitude of change determines the first target iteration interval. Specifically, when the current inspection iteration is greater than the 100th iteration, the processor can adaptively adjust the data bit width corresponding to the current inspection iteration, obtain the target data bit width corresponding to the current inspection iteration, and change the data bit width corresponding to the current inspection iteration.
  • the target data bit width is taken as the data bit width of the first target iteration interval, where the data bit widths corresponding to the iterations in the first target iteration interval are consistent.
  • the processor may determine the point position corresponding to the current inspection iteration according to the target data bit width corresponding to the current inspection iteration and the data to be quantified, and determine the first error according to the point position corresponding to the current inspection iteration.
  • the processor may also determine the quantization error according to the data to be quantized corresponding to the current inspection iteration, and determine the second error according to the quantization error.
  • the processor may determine the first target iteration interval according to the first error and the second error, and the first target iteration interval may be greater than the aforementioned preset iteration interval. Further, the processor may determine a quantization parameter such as a point position or a scaling factor in the first target iteration interval, and the specific determination method may refer to the above description.
  • the processor may determine that the first target iteration interval includes 3 iterations. These are the 100th iteration, the 101st iteration and the 102th iteration.
  • the processor can also determine the quantization error according to the data to be quantized in the 100th iteration, and determine the second error and the target data bit width corresponding to the 100th iteration according to the quantization error, and use the target data bit width as the first target iteration interval to correspond to
  • the data bit width of the 100th iteration, the 101st iteration, and the 102th iteration are all the target data bit widths corresponding to the 100th iteration.
  • the processor may also determine quantization parameters such as point positions and scaling coefficients corresponding to the 100th iteration according to the data to be quantized in the 100th iteration and the target data bit width corresponding to the 100th iteration. After that, the quantization parameter corresponding to the 100th iteration is used to quantize the 100th iteration, the 101st iteration, and the 102nd iteration.
  • operation S200 may further include:
  • the second target iteration interval and the total number of iterations in each cycle are used to determine the second inspection iteration corresponding to the current inspection iteration.
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the processor may further determine whether the current inspection iteration is greater than the second preset iteration.
  • the second preset iteration is greater than the first preset iteration
  • the second preset iteration interval is greater than the preset iteration interval.
  • the foregoing second preset iteration may be a hyperparameter, and the second preset iteration may be greater than the total number of iterations in at least one cycle.
  • the second preset iteration may be determined according to the data variation curve of the data to be quantified.
  • the second preset iteration may also be customized by the user.
  • determining the second target iteration interval corresponding to the current inspection iteration according to the first target iteration interval and the total number of iterations in each cycle includes:
  • the update period corresponding to the current test iteration is determined according to the sorted number of the current test iteration in the current period and the total number of iterations in the period after the current period, and the total number of iterations in the update period is greater than or equal to the The number of iterative sorts;
  • the second target iteration interval is determined according to the first target iteration interval, the sequenced number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the first cycle iteration iter 1 at t 2 corresponding to a second period may be iteratively updated iter 2 t 1 in the first cycle iteration 1 iteration iter T 1. It is determined in the t 2 iteration of the first cycle iter 1 that the quantization parameter needs to be updated.
  • the processor can update the quantization parameter and the first target iteration interval according to the preset iteration interval and the second target iteration interval.
  • the second target iteration interval is called the reference iteration interval or the target iteration interval.
  • the processor can By determining the quantization parameters such as the point positions in the reference iteration interval, the purpose of adjusting the quantization parameters in the cyclic neural network operation according to the reference iteration interval is achieved. Wherein, the quantization parameters corresponding to the iterations in the reference iteration interval may be consistent.
  • each iteration in the reference iteration interval uses the same point position, and only the quantitative parameters such as the determined point position are updated at each inspection iteration, which can avoid updating and adjusting the quantization parameters in each iteration, thereby reducing The amount of calculation in the quantization process improves the efficiency of the quantization operation.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the reference iteration interval For the corresponding point position, the point position corresponding to the current inspection iteration is used for the iterations in the reference iteration interval.
  • the target data bit width corresponding to the current inspection iteration may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration is customized by the user.
  • the point position corresponding to the current inspection iteration can be calculated by referring to formula (2) or formula (14) above.
  • the data bit width corresponding to each iteration in the cyclic neural network operation may change, that is, the data bit width corresponding to different reference iteration intervals may be inconsistent, but the data bit width of each iteration in the reference iteration interval constant.
  • the data bit width corresponding to the iteration in the reference iteration interval may be a hyperparameter.
  • the data bit width corresponding to the iteration in the reference iteration interval may be user-defined input.
  • the data bit width corresponding to the iteration in the reference iteration interval may also be calculated by the processor.
  • the processor may determine the target data bit width corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration. , And use the target data bit width corresponding to the current inspection iteration as the data bit width corresponding to the reference iteration interval.
  • the quantization parameters such as the position of the corresponding point in the reference iteration interval may also remain unchanged. That is to say, each iteration in the reference iteration interval uses the same point position, and only the quantitative parameters such as the determined point position and the data bit width are updated at each inspection iteration, so as to avoid updating and adjusting the quantization parameters in each iteration. , Thereby reducing the amount of calculation in the quantization process and improving the efficiency of the quantization operation.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the reference iteration interval For the corresponding point position, the point position corresponding to the current inspection iteration is used for the iterations in the reference iteration interval.
  • the target data bit width corresponding to the current inspection iteration may be a hyperparameter.
  • the target data bit width corresponding to the current inspection iteration is customized by the user.
  • the point position corresponding to the current inspection iteration can be calculated by referring to formula (2) or formula (14) above.
  • the scaling factors corresponding to iterations in the reference iteration interval may be consistent.
  • the processor may determine the scaling factor corresponding to the current test iteration according to the to-be-quantized data of the current test iteration, and use the scaling factor corresponding to the current test iteration as the scaling factor of each iteration in the reference iteration interval. Wherein, the scaling factors corresponding to the iterations in the reference iteration interval are consistent.
  • the offsets corresponding to the iterations in the reference iteration interval are consistent.
  • the processor may determine the offset corresponding to the current inspection iteration according to the to-be-quantized data of the current inspection iteration, and use the offset corresponding to the current inspection iteration as the offset of each iteration in the reference iteration interval. Further, the processor may also determine the minimum and maximum values among all the elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling coefficients. For details, please refer to the above description.
  • the offsets corresponding to the iterations in the reference iteration interval are consistent.
  • the reference iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may be the initial iteration of the reference iteration interval. For example, if the current inspection iteration is the 100th iteration, the processor determines that the reference iteration interval is 3 based on the data change range of the data to be quantified, and the processor can determine that the reference iteration interval includes 3 iterations, which are respectively the 100th iteration. Second iteration, 101st iteration, and 102nd iteration.
  • the processor may determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized corresponding to the 100th iteration and the target data bit width, and may use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 100th iteration, the 101st iteration and the 102th iteration are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 101st iteration and the 102nd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the reference iteration interval may also be the number of iterations calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may also be the termination iteration of the reference iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the reference iteration interval is 3 according to the data variation range of the data to be quantified. Then the processor may determine that the reference iteration interval includes 3 iterations, which are the 101st iteration, the 102nd iteration, and the 103rd iteration, respectively.
  • the processor may determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized corresponding to the 100th iteration and the target data bit width, and may use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 101st, 102nd, and 103rd iterations are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 102nd iteration and the 103rd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the data bit widths and quantization parameters corresponding to each iteration in the same reference iteration interval are all consistent, that is, the data bit widths, point positions, scaling factors, and offsets corresponding to each iteration in the same reference iteration interval are all the same. Keep it unchanged, so that during the training or fine-tuning process of the recurrent neural network, frequent adjustment of the quantization parameters of the data to be quantified can be avoided, the calculation amount in the quantization process can be reduced, and the quantization efficiency can be improved. In addition, by dynamically adjusting the quantization parameters in different stages of training or fine-tuning according to the range of data changes, the quantization accuracy can be ensured.
  • FIG. 15 shows a flowchart of adjusting quantization parameters in a quantization parameter adjustment method of an embodiment of the present disclosure.
  • the foregoing operation S300 may further include:
  • S310 Determine the data bit width corresponding to the reference iteration interval according to the to-be-quantized data of the current inspection iteration; wherein the data bit widths corresponding to the iterations in the reference iteration interval are consistent. That is to say, the data bit width during the operation of the cyclic neural network is updated every other reference iteration interval.
  • the data bit width corresponding to the reference iteration interval may be the target data bit width of the current inspection iteration.
  • the target data bit width of the current inspection iteration please refer to operations S114 and S115 above, which will not be repeated here.
  • the reference iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may be the initial iteration of the reference iteration interval. For example, if the current inspection iteration is the 100th iteration, the processor determines that the reference iteration interval is 6, according to the data change range of the data to be quantified, and the processor can determine that the reference iteration interval includes 6 iterations, which are respectively the 100th iteration. Iterations to the 105th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 105th iterations, and there is no need to go from the 101st to the 105th iterations. Calculate the target data bit width, thereby reducing the amount of calculation and improving the quantization efficiency and computing efficiency. After that, the 106th iteration can be used as the current inspection iteration, and the above operations of determining the reference iteration interval and updating the data bit width are repeated.
  • the reference iteration interval may also be calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the reference iteration interval may also be the termination iteration of the reference iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the reference iteration interval is 6 according to the data change range of the data to be quantified. Then the processor may determine that the reference iteration interval includes 6 iterations, which are respectively the 101st iteration to the 106th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 106th iterations, and there is no need to calculate the target from the 101st to 106th iterations.
  • the data bit width reduces the amount of calculation and improves the quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the reference iteration interval and updating the data bit width are repeated.
  • the processor adjusts the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval, so as to adjust the quantification of point positions in the cyclic neural network operation. parameter.
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval may be a hyperparameter, for example, the point position iteration interval may be user-defined input.
  • the point position iteration interval is less than or equal to the reference iteration interval.
  • the processor can synchronously update the quantization parameters such as the data bit width and the point position at the current inspection iteration.
  • the scaling factors corresponding to the iterations in the reference iteration interval may be consistent.
  • the offsets corresponding to the iterations in the reference iteration interval are consistent.
  • the quantization parameters such as the data bit width and point position corresponding to the iterations in the reference iteration interval are all the same, so that the amount of calculation can be reduced, and the quantization efficiency and computing efficiency can be improved.
  • the specific implementation process is basically the same as the foregoing embodiment, and may refer to the above description, which will not be repeated here.
  • the processor can update the quantitative parameters such as data bit width and point position at the inspection iteration corresponding to the reference iteration interval, and update at the sub-inspection iteration determined by the point position iteration interval Quantitative parameters such as point position. Since the quantization parameters such as the point position can be fine-tuned according to the data to be quantized when the data bit width is unchanged, the quantization parameters such as the point position can also be adjusted within the same reference iteration interval to further improve the quantization accuracy.
  • the processor may determine a sub-inspection iteration according to the current inspection iteration and the point position iteration interval, the sub-inspection iteration is used to adjust the point position, and the sub-inspection iteration may be an iteration in the reference iteration interval. Further, the processor may adjust the position of the point corresponding to the iteration in the reference iteration interval according to the data to be quantized in the sub-test iteration and the data bit width corresponding to the reference iteration interval, wherein the determination method of the point position may refer to the above formula (2) Or formula (14), which will not be repeated here.
  • the current inspection iteration is the 100th iteration
  • the reference iteration interval is 6, and the reference iteration interval includes iterations from the 100th iteration to the 105th iteration.
  • the processor may use the 100th iteration as the aforementioned sub-test iteration, and calculate the point position s1 corresponding to the 100th iteration.
  • the point position s1 is shared between the 100th iteration, the 101st iteration, and the 102nd iteration. s1 is quantified.
  • the processor can use the 103rd iteration as the above-mentioned sub-test iteration according to the point position iteration interval I s1 , and the processor can also determine the data bit width n corresponding to the 103rd iteration and the reference iteration interval to be quantified.
  • the point position s2 corresponding to the second point position iteration interval can be quantified by sharing the aforementioned point position s2 from the 103rd iteration to the 105th iteration.
  • the value of the aforementioned point position s1 before update and the value of the point position s2 after update may be the same or different.
  • the processor may determine the next reference iteration interval and quantization parameters such as the data bit width and point position corresponding to the next reference iteration interval according to the data variation range of the data to be quantized again in the 106th iteration.
  • the current inspection iteration is the 100th iteration
  • the reference iteration interval is 6, and the reference iteration interval includes iterations from the 101st iteration to the 106th iteration.
  • the processor can determine the point position corresponding to the first point position iteration interval as s1 according to the data to be quantified in the current test iteration and the target data bit width n1 corresponding to the current test iteration.
  • the second iteration and the 103th iteration share the above-mentioned point position s1 for quantization.
  • the processor can use the 104th iteration as the aforementioned sub-test iteration according to the point position iteration interval I s1 , and the processor can also determine the data bit width n1 corresponding to the 104th iteration and the reference iteration interval to be quantified.
  • the point position s2 corresponding to the second point position iteration interval can be quantified by sharing the aforementioned point position s2 from the 104th iteration to the 106th iteration.
  • the value of the aforementioned point position s1 before update and the value of the point position s2 after update may be the same or different.
  • the processor may determine the next reference iteration interval and the quantization parameters such as the data bit width and point position corresponding to the next reference iteration interval according to the data variation range of the data to be quantized again in 106 iterations.
  • the point position iteration interval may be equal to 1, that is, the point position is updated once for each iteration.
  • the point position iteration interval may be the same or different.
  • the at least one point position iteration interval included in the reference iteration interval may increase sequentially.
  • the scaling factors corresponding to iterations in the reference iteration interval may also be inconsistent.
  • the scaling factor can be updated synchronously with the aforementioned point position, that is, the iteration interval corresponding to the scaling factor can be equal to the aforementioned point position iteration interval. That is, whenever the processor updates the position of the determined point, it will update the determined zoom factor accordingly.
  • the offset corresponding to the iteration in the reference iteration interval may also be inconsistent.
  • the offset may be updated synchronously with the above-mentioned point position, that is, the iteration interval corresponding to the offset may be equal to the above-mentioned point position iteration interval. That is, whenever the processor updates the position of the determined point, it will update the determined offset accordingly.
  • the offset can also be updated asynchronously with the aforementioned location position or data bit width, which is not specifically limited here.
  • the processor may also determine the minimum and maximum values among all the elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling coefficients. For details, please refer to the above description.
  • the processor can comprehensively determine the data change range of the data to be quantized according to the change range of the point position and the data bit width of the data to be quantized, and determine the reference according to the data change range of the data to be quantized Iteration interval, where the reference iteration interval can be used to update and determine the data bit width, that is, the processor can update and determine the data bit width at each inspection iteration of the reference iteration interval.
  • FIG. 16 shows a flowchart of a method for determining a first target iteration interval in a parameter adjustment method of another embodiment of the present disclosure. As shown in FIG. 16, the above method may include:
  • the above-mentioned second error may be determined according to the quantization error, and the second error is positively correlated with the above-mentioned quantization error.
  • the foregoing operation S500 may include:
  • the second error is determined according to the quantization error, and the second error is positively correlated with the quantization error.
  • the quantized data of the current inspection iteration is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the initial data bit width.
  • quantization error determination method please refer to the description in operation S114 above, which will not be repeated here.
  • the second error can be calculated according to the following formula:
  • diff update2 represents the above-mentioned second error
  • diff bit represents the above-mentioned quantization error
  • may be a hyperparameter
  • the processor may calculate the target error according to the first error and the second error, and determine the target iteration interval according to the target error.
  • the processor may determine a target iteration interval according to the target error, and the target iteration interval is negatively correlated with the target error. That is, the larger the target error, the smaller the target iteration interval.
  • the target error may also be determined according to the maximum value of the first error and the second error, and at this time, the weight of the first error or the second error takes a value of 0.
  • the foregoing operation S600 may include:
  • the first target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the first target iteration interval.
  • the processor may compare the magnitudes of the first error diff update1 and the second error diff update2 , and when the first error diff update1 is greater than the second error diff update2 , the target error is equal to the first error diff update1 .
  • the target error is equal to the second error diff update2 .
  • the target error can be the first error diff update1 or the second error diff update2 . That is, the target error diff update can be determined according to the following formula:
  • diff update max(diff update1 , diff update2 ) formula (35)
  • diff update refers to the target error
  • diff update1 refers to the first error
  • diff update2 refers to the second error
  • the first target iteration interval can be determined as follows:
  • the first target iteration interval can be calculated according to the following formula:
  • I represents the target iteration interval
  • diff update represents the above-mentioned target error
  • ⁇ and ⁇ can be hyperparameters.
  • the data bit width is variable in the cyclic neural network operation, and the change trend of the data bit width can be measured by the second error.
  • the processor can determine the second target iteration interval and the data bit width corresponding to the iteration in the second target iteration interval, where the iteration corresponding to the second target iteration interval
  • the data bit width is consistent.
  • the processor may determine the data bit width corresponding to the second target iteration interval according to the to-be-quantized data of the current inspection iteration. In other words, the data bit width during the operation of the cyclic neural network is updated every second target iteration interval.
  • the data bit width corresponding to the second target iteration interval may be the target data bit width of the current inspection iteration.
  • the target data bit width of the current inspection iteration please refer to operations S114 and S115 above, which will not be repeated here.
  • the second target iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may be the initial iteration of the second target iteration interval. For example, if the current inspection iteration is the 100th iteration, and the processor determines that the iteration interval of the second target iteration interval is 6, according to the data change range of the data to be quantified, the processor may determine that the second target iteration interval includes 6 iterations. Respectively from the 100th iteration to the 105th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 105th iterations, and there is no need to go from the 101st to the 105th iterations. Calculate the target data bit width, thereby reducing the amount of calculation and improving the quantization efficiency and computing efficiency. After that, the 106th iteration can be used as the current inspection iteration, and the above operations of determining the second target iteration interval and updating the data bit width are repeated.
  • the second target iteration interval may also be calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may also be the termination iteration of the second target iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the second target iteration interval is 6 according to the data variation range of the data to be quantified. Then the processor may determine that the second target iteration interval includes 6 iterations, which are respectively the 101st iteration to the 106th iteration.
  • the processor can determine the target data bit width of the 100th iteration, and the target data bit width of the 100th iteration is used from the 101st to 106th iterations, and there is no need to calculate the target from the 101st to 106th iterations.
  • the data bit width reduces the amount of calculation and improves the quantization efficiency and computing efficiency.
  • the 106th iteration can be used as the current inspection iteration, and the above operations of determining the target iteration interval and updating the data bit width are repeated.
  • the processor may also determine the quantization parameter in the second target iteration interval at the verification iteration, so as to adjust the quantization parameter in the cyclic neural network operation according to the second target iteration interval. That is, the quantization parameters such as the point position in the cyclic neural network operation can be updated synchronously with the data bit width.
  • the quantization parameters corresponding to the iterations in the second target iteration interval may be consistent.
  • the processor may determine the point position corresponding to the current inspection iteration according to the data to be quantified in the current inspection iteration and the target data bit width corresponding to the current inspection iteration, and use the point position corresponding to the current inspection iteration as the second The position of the point corresponding to the target iteration interval, wherein the position of the point corresponding to the iteration in the second target iteration interval is consistent.
  • each iteration in the second target iteration interval uses the quantization parameters such as the point position of the current inspection iteration, which avoids updating and adjusting the quantization parameters in each iteration, thereby reducing the amount of calculation in the quantization process and improving To quantify the efficiency of the operation.
  • the scaling factors corresponding to the iterations in the second target iteration interval may be consistent.
  • the processor may determine the scaling factor corresponding to the current test iteration according to the to-be-quantized data of the current test iteration, and use the scaling factor corresponding to the current test iteration as the scaling factor of each iteration in the second target iteration interval. Wherein, the scaling factors corresponding to the iterations in the second target iteration interval are consistent.
  • the offsets corresponding to the iterations in the second target iteration interval are consistent.
  • the processor may determine the offset corresponding to the current inspection iteration according to the to-be-quantized data of the current inspection iteration, and use the offset corresponding to the current inspection iteration as the offset of each iteration in the second target iteration interval. Further, the processor may also determine the minimum and maximum values among all the elements of the data to be quantized, and further determine quantization parameters such as point positions and scaling coefficients. For details, please refer to the above description.
  • the offsets corresponding to the iterations in the second target iteration interval are consistent.
  • the second target iteration interval may calculate the number of iterations from the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may be the initial iteration of the second target iteration interval. For example, if the current inspection iteration is the 100th iteration, and the processor determines that the iteration interval of the second target iteration interval is 3 according to the data change range of the data to be quantified, the processor may determine that the second target iteration interval includes 3 iterations. These are the 100th iteration, the 101st iteration and the 102th iteration.
  • the processor may determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized corresponding to the 100th iteration and the target data bit width, and may use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 100th iteration, the 101st iteration and the 102th iteration are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 101st iteration and the 102nd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the second target iteration interval may also be calculated from the next iteration of the current inspection iteration, that is, the inspection iteration corresponding to the second target iteration interval may also be the termination iteration of the second target iteration interval.
  • the current inspection iteration is the 100th iteration
  • the processor determines that the iteration interval of the second target iteration interval is 3 according to the data variation range of the data to be quantified. Then the processor may determine that the second target iteration interval includes 3 iterations, which are the 101st iteration, the 102nd iteration, and the 103rd iteration, respectively.
  • the processor may determine the quantization parameters such as the point position corresponding to the 100th iteration according to the data to be quantized corresponding to the 100th iteration and the target data bit width, and may use the quantization parameter pair such as the point position corresponding to the 100th iteration
  • the 101st, 102nd, and 103rd iterations are quantified.
  • the processor does not need to calculate quantization parameters such as point positions in the 102nd iteration and the 103rd iteration, which reduces the amount of calculation in the quantization process and improves the efficiency of the quantization operation.
  • the data bit widths and quantization parameters corresponding to each iteration in the same second target iteration interval are consistent, that is, the data bit widths, point positions, zoom coefficients, and data bit widths corresponding to each iteration in the same second target iteration interval are the same.
  • the offset remains unchanged, so that during the training or fine-tuning process of the recurrent neural network, frequent adjustments of the quantization parameters of the data to be quantified can be avoided, the calculation amount in the quantization process is reduced, and the quantization efficiency can be improved.
  • the quantization accuracy can be ensured.
  • the processor may also determine the quantization parameter in the second target iteration interval according to the point position iteration interval corresponding to the quantization parameter such as the point position, so as to adjust the quantization parameter in the recurrent neural network operation. That is, the quantitative parameters such as the point position in the cyclic neural network operation can be updated asynchronously with the data bit width, and the processor can update the quantitative parameters such as the data bit width and the point position at the inspection iteration of the second target iteration interval, and the processor can also update according to The point position iteration interval separately updates the point position corresponding to the iteration in the second target iteration interval.
  • the processor may also determine the data bit width corresponding to the second target iteration interval according to the target data bit width corresponding to the current inspection iteration, where the data bit widths corresponding to the iterations in the second target iteration interval are consistent. After that, the processor can adjust the quantitative parameters such as the point position in the cyclic neural network operation process according to the data bit width and the point position iteration interval corresponding to the second target iteration interval.
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval may be a hyperparameter, for example, the point position iteration interval may be user-defined input.
  • the above-mentioned method can be used in the training or fine-tuning process of the cyclic neural network to realize the adjustment of the quantized parameters of the cyclic neural network fine-tuning or the calculation data involved in the training process, so as to improve the cyclic neural network.
  • the operation data may be at least one of neuron data, weight data, or gradient data.
  • Figure 5a according to the data change curve of the data to be quantified, it can be seen that in the initial stage of training or fine-tuning, the difference between the data to be quantified in each iteration is relatively large, and the data change range of the data to be quantified is relatively severe.
  • the value of the target iteration interval can be small, and the quantization parameter in the target iteration interval can be updated in a timely manner to ensure the quantization accuracy.
  • the data change range of the data to be quantified gradually tends to be flat.
  • the value of the target iteration interval can be increased to avoid frequent updating of quantization parameters to improve quantization efficiency and computing efficiency.
  • the training or fine-tuning of the cyclic neural network tends to be stable (that is, when the positive operation result of the cyclic neural network approaches the preset reference value, the training or fine-tuning of the cyclic neural network tends to Stable), at this time, you can continue to increase the value of the target iteration interval to further improve the quantization efficiency and computing efficiency.
  • different methods can be used to determine the target iteration interval at different stages of the training or fine-tuning of the cyclic neural network, so as to improve the quantization efficiency and computing efficiency on the basis of ensuring the quantization accuracy.
  • FIG. 17 shows a flowchart of a quantization parameter adjustment method according to still another embodiment of the present disclosure. As shown in FIG. 17, the above method may further include:
  • the processor may further perform operation S712, that is, the processor may further determine whether the current iteration is greater than the second preset iteration.
  • the second preset iteration is greater than the first preset iteration
  • the second preset iteration interval is greater than the first preset iteration interval.
  • the foregoing second preset iteration may be a hyperparameter, and the second preset iteration may be greater than the total number of iterations of at least one training period.
  • the second preset iteration may be determined according to the data variation curve of the data to be quantified.
  • the second preset iteration may also be customized by the user.
  • the processor may perform operation S714, use the second preset iteration interval as the target iteration interval, and adjust the second preset iteration interval according to the second preset iteration interval.
  • the parameters of the neural network quantification process When the current iteration is greater than the first preset iteration and the current iteration is less than the second preset iteration, the processor may perform the above-mentioned operation S713, and determine the target iteration interval according to the data variation range of the data to be quantized, and according to the The target iteration interval adjusts the quantization parameter.
  • the processor may read the second preset iteration set by the user, and determine the second preset iteration interval according to the corresponding relationship between the second preset iteration and the second preset iteration interval, the second preset iteration The interval is greater than the first preset iteration interval.
  • the degree of convergence of the neural network satisfies a preset condition
  • the forward operation result of the current iteration approaches the preset reference value
  • the current iteration is greater than or equal to the second preset iteration.
  • the loss value corresponding to the current iteration is less than or equal to the preset threshold, it can be determined that the degree of convergence of the neural network meets the preset condition.
  • the aforementioned second preset iteration interval may be a hyperparameter, and the second preset iteration interval may be greater than or equal to the total number of iterations of at least one training period.
  • the second preset iteration interval may be customized by the user.
  • the processor can directly read the second preset iteration and the second preset iteration interval input by the user, and update the quantization parameter in the neural network operation according to the second preset iteration interval.
  • the second preset iteration interval may be equal to the total number of iterations of one training period, that is, the quantization parameter is updated once every training period (epoch).
  • the above method also includes:
  • the processor may also determine whether the current data bit width needs to be adjusted at each inspection iteration. If the current data bit width needs to be adjusted, the processor can switch from the above operation S714 to operation S713 to re-determine the data bit width so that the data bit width can meet the requirements of the data to be quantized.
  • the processor may determine whether the data bit width needs to be adjusted according to the above-mentioned second error.
  • the processor may also perform the above operation S715 to determine whether the second error is greater than the preset error value, and when the current iteration is greater than or equal to the second preset iteration and the second error is greater than the preset error value, switch to perform the operation S713: Determine an iteration interval according to the data variation range of the data to be quantized, so as to re-determine the data bit width according to the iteration interval.
  • the preset iteration interval adjusts the parameters in the quantization process of the neural network.
  • the preset error value may be determined according to the preset threshold corresponding to the quantization error.
  • the processor may be based on the data to be quantized The data change range of the determines the iteration interval to re-determine the data bit width according to the iteration interval.
  • the second preset iteration interval is the total number of iterations in one training period.
  • the processor may update the quantization parameter according to the second preset iteration interval, that is, the quantization parameter is updated once every training period (epoch).
  • the initial iteration of each training cycle is regarded as a testing iteration.
  • the processor can determine the quantization error according to the data to be quantized in the testing iteration, and determine the second error according to the quantization error, And determine whether the second error is greater than the preset error value according to the following formula:
  • diff update2 represents the second error
  • diff bit represents the quantization error
  • represents the hyperparameter
  • T represents the preset error value
  • the preset error value may be equal to the first preset threshold divided by the hyperparameter.
  • the preset error value can also be a hyperparameter.
  • the second error diff update2 is greater than the preset error value T, it means that the data bit width may not meet the preset requirements. At this time, the second preset iteration interval may no longer be used to update the quantization parameters, and the processor may follow the data to be quantized.
  • the range of data change determines the target iteration interval to ensure that the data bit width meets the preset requirements. That is, when the second error diff update2 is greater than the preset error value T, the processor switches from the aforementioned operation S714 to the aforementioned operation S713.
  • the processor may determine whether the data bit width needs to be adjusted according to the aforementioned quantization error.
  • the second preset iteration interval is the total number of iterations in one training period.
  • the processor may update the quantization parameter according to the second preset iteration interval, that is, the quantization parameter is updated once every training period (epoch).
  • the initial iteration of each training cycle is used as a test iteration.
  • the processor can determine the quantization error according to the data to be quantized in the test iteration, and when the quantization error is greater than or equal to the first preset threshold, it means that the data bit width may not meet the preset threshold. It is assumed that the processor switches from the above-mentioned operation S714 to the above-mentioned operation S713.
  • the aforementioned quantization parameters such as the position of the point, the scaling factor, and the offset may be displayed on a display device.
  • the user can learn the quantization parameter during the operation of the recurrent neural network through the display device, and the user can also adaptively modify the quantization parameter determined by the processor.
  • the aforementioned data bit width and target iteration interval can also be displayed by the display device.
  • the user can learn the parameters such as the target iteration interval and data bit width during the operation of the cyclic neural network through the display device, and the user can also adaptively modify the parameters such as the target iteration interval and data bit width determined by the processor.
  • An embodiment of the present disclosure also provides a quantization parameter adjustment device 200 of a cyclic neural network.
  • the quantization parameter adjustment device 200 may be installed in a processor.
  • the quantization parameter adjustment device 200 can be placed in a general-purpose processor.
  • the quantization parameter adjustment device can also be placed in an artificial intelligence processor.
  • Figure 18 shows an example of the present disclosure
  • the obtaining module 210 is used to obtain the data change range of the data to be quantified
  • the iteration interval determination module 220 is configured to determine a first target iteration interval according to the data variation range of the data to be quantized, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the device further includes:
  • the preset interval determination module is configured to adjust the quantization parameter according to the preset iteration interval when the current inspection iteration is less than or equal to the first preset iteration.
  • the iteration interval determination module is further configured to determine the first target iteration interval according to the data variation range of the data to be quantified when the current inspection iteration is greater than the first preset iteration.
  • the iteration interval determination module includes:
  • the second target iteration interval determination sub-module when the current inspection iteration is greater than or equal to the second preset iteration, and the current inspection iteration requires quantitative parameter adjustment, it is determined according to the first target iteration interval and the total number of iterations in each cycle.
  • the second target iteration interval corresponding to the current inspection iteration;
  • the update iteration determination sub-module determines the update iteration corresponding to the current inspection iteration according to the second target iteration interval, so as to adjust the quantization parameter in the update iteration, and the update iteration is the current inspection Iteration after iteration;
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the second target iteration interval determination submodule includes:
  • the update cycle determination sub-module determines the update cycle corresponding to the current inspection iteration according to the number of iterations in the current cycle of the current inspection iteration and the total number of iterations in the cycles after the current cycle.
  • the total number is greater than or equal to the iterative sorting number
  • the determining sub-module determines the second target iteration interval according to the first target iteration interval, the number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the iteration interval determination module is further configured to determine that the current inspection iteration is greater than or equal to a second preset iteration when the degree of convergence of the cyclic neural network meets a preset condition.
  • the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the quantization parameter determination module is used to determine the point position corresponding to the iteration in the reference iteration interval according to the target data bit width corresponding to the current inspection iteration and the to-be-quantized data of the current inspection iteration to adjust the points in the cyclic neural network operation position;
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the data bit width determination module is used to determine the data bit width corresponding to the reference iteration interval according to the target data bit width corresponding to the current inspection iteration, wherein the data bit width corresponding to the iteration in the reference iteration interval is consistent, and the reference The iteration interval includes the second target iteration interval or the preset iteration interval;
  • the quantization parameter determination module is configured to adjust the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval, so as to adjust the point position in the neural network operation;
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the point position iteration interval is less than or equal to the reference iteration interval.
  • the quantization parameter further includes a scaling factor, and the scaling factor is updated synchronously with the point position.
  • the quantization parameter further includes an offset, and the offset is updated synchronously with the point position.
  • the data bit width determination module includes:
  • the quantization error determination sub-module is used to determine the quantization error according to the quantized data of the current inspection iteration and the quantized data of the current inspection iteration, wherein the quantitative data of the current inspection iteration compares the quantization data of the current inspection iteration to the current inspection iteration. Quantitative data is obtained quantitatively;
  • the data bit width determination sub-module is used to determine the target data bit width corresponding to the current inspection iteration according to the quantization error.
  • the data bit width determining unit is configured to determine the target data bit width corresponding to the current inspection iteration according to the quantization error, specifically:
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • the data bit width determination unit is configured to increase the data bit width corresponding to the current inspection iteration if the quantization error is greater than or equal to a first preset threshold to obtain the current inspection
  • a first preset threshold to obtain the current inspection
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • the data bit width determining unit is configured to reduce the data bit width corresponding to the current inspection iteration if the quantization error is less than or equal to a second preset threshold to obtain the current
  • a second preset threshold to obtain the current
  • the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the acquisition module includes:
  • the first acquisition module is used to acquire the range of change of the point position; wherein the range of change of the point position can be used to characterize the range of data change of the data to be quantized, and the range of change of the point position corresponds to the data to be quantified
  • the data changes are positively correlated.
  • the first obtaining module includes:
  • the first mean value determining unit is configured to determine the first mean value according to the point position corresponding to the previous test iteration before the current test iteration and the point position corresponding to the historical iteration before the last test iteration, wherein the last The inspection iteration is the inspection iteration corresponding to the previous iteration interval before the target iteration interval;
  • the second mean value determining unit is configured to determine the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration; wherein the point position corresponding to the current inspection iteration is determined according to the current inspection iteration.
  • the target data bit width corresponding to the current inspection iteration and the data to be quantified are determined;
  • the first error determining unit is configured to determine a first error according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • the second average value determining unit is specifically configured to:
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • the second average value determining unit is specifically configured to determine the second average value according to a point position corresponding to the current inspection iteration and the first average value.
  • the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration
  • the data bit width adjustment value of the current inspection iteration is determined according to the target data bit width and the initial data bit width of the current inspection iteration.
  • the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration, specifically:
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • the iteration interval determination module is configured to determine the target iteration interval according to the first error, and the target iteration interval is negatively correlated with the first error.
  • the acquiring module further includes:
  • the second acquisition module is used to acquire the change trend of the data bit width; determine the data change range of the data to be quantified according to the change range of the point position and the change trend of the data bit width.
  • the iteration interval determination module is further configured to determine the target iteration interval according to the acquired first error and second error; wherein, the first error is used to characterize the change of the point position Amplitude, the second error is used to characterize the changing trend of the data bit width.
  • the iteration interval determination module is configured to determine the target iteration interval according to the acquired first error and second error, specifically to:
  • the target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the target iteration interval.
  • the second error is determined according to a quantization error
  • the quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • the iteration interval determination module is further configured to: when the current inspection iteration is greater than or equal to the second preset iteration, and the second error is greater than the preset error value, then according to the waiting The data variation range of the quantified data determines the first target iteration interval.
  • each module or unit in the embodiment of the present application is basically the same as the implementation process of each operation in the foregoing method.
  • the above device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the above-mentioned integrated unit/module can be implemented in the form of hardware or software program module. If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the present disclosure also provides a computer-readable storage medium in which a computer program is stored.
  • a computer program is stored.
  • the method as in any of the above-mentioned embodiments is implemented.
  • the computer program is executed by a processor or a device, the following method is implemented:
  • the quantization parameter of the neural network is used to realize the quantization operation of the data to be quantized in the operation of the cyclic neural network.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned quantization parameter adjustment device.
  • a board card which includes a storage device, an interface device, a control device, and the above-mentioned artificial intelligence chip; wherein the artificial intelligence chip is related to the storage device and the control device.
  • the interface devices are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device is used to The state of the artificial intelligence chip is monitored.
  • FIG. 19 shows a block diagram of a board according to an embodiment of the present disclosure.
  • the board may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a method for adjusting quantitative parameters of a recurrent neural network comprising:
  • the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • Clause A2 The method according to Clause A1, the method further comprising:
  • the quantization parameter is adjusted according to the preset iteration interval.
  • determining the first target iteration interval according to the data variation range of the data to be quantified includes:
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • a first target iteration interval is determined according to the data variation range of the data to be quantified, so as to adjust the recurrent neural network according to the first target iteration interval
  • the quantization parameters in the calculation include:
  • the second target iteration interval and the total number of iterations in each cycle are used to determine the second inspection iteration corresponding to the current inspection iteration.
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • determining the second target iteration interval corresponding to the current inspection iteration according to the first target iteration interval and the total number of iterations in each cycle includes:
  • the update period corresponding to the current test iteration is determined according to the sorted number of the current test iteration in the current period and the total number of iterations in the period after the current period, and the total number of iterations in the update period is greater than or equal to the The number of iterative sorts;
  • the second target iteration interval is determined according to the first target iteration interval, the sequenced number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the first target iteration interval is determined according to the data variation amplitude of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, Also includes:
  • the target data bit width corresponding to the current inspection iteration and the data to be quantified in the current inspection iteration determine the point position corresponding to the iteration in the reference iteration interval to adjust the point position in the cyclic neural network operation
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • the data bit width corresponding to the reference iteration interval is determined, wherein the data bit widths corresponding to the iterations in the reference iteration interval are consistent, and the reference iteration interval includes the second target Iteration interval or the preset iteration interval;
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • the target data bit width corresponding to the current inspection iteration is determined.
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the obtaining the data variation range of the data to be quantified includes:
  • variation range of the point position wherein the variation range of the point position can be used to characterize the data variation range of the data to be quantized, and the variation range of the point position is positively correlated with the data variation range of the data to be quantized.
  • the range of change in the position of the acquisition point includes:
  • a first error is determined according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • determining the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration including:
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • the second average value is determined according to the position of the point corresponding to the current inspection iteration and the first average value.
  • the second average value is updated according to the acquired data bit width adjustment value of the current inspection iteration; wherein, the data bit width adjustment value of the current inspection iteration is based on the target data bit width and the initial data bit of the current inspection iteration Wide ok.
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • the first target iteration interval is determined according to the first error, and the first target iteration interval is negatively correlated with the first error.
  • said obtaining the data variation range of the data to be quantified further includes:
  • the data change range of the data to be quantized is determined.
  • determining the first target iteration interval according to the data variation range of the data to be quantified further includes:
  • the acquired first error and second error determine the first target iteration interval; wherein, the first error is used to characterize the change range of the point position, and the second error is used to characterize the change trend of the data bit width .
  • determining the first target iteration interval according to the acquired first error and second error includes:
  • the first target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the first target iteration interval.
  • Clause A26 According to the method described in Clause A24 or Clause A25, the second error is determined according to the quantization error
  • the quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • Clause A27 The method according to Clause A4, the method further comprising:
  • the first target iteration interval is determined according to the data variation range of the data to be quantified.
  • Clause A28 The method according to any one of clauses A1 to A27, wherein the data to be quantified is at least one of neuron data, weight data, or gradient data.
  • a quantization parameter adjustment device of a recurrent neural network comprising a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the implementation is as described in any of clause A1-28 Steps of the method.
  • Clause A30 A computer-readable storage medium with a computer program stored in the computer-readable storage medium, which, when executed, implements the steps of the method described in any one of clauses A1 to A28.
  • a quantitative parameter adjustment device of a recurrent neural network comprising:
  • the acquisition module is used to acquire the data change range of the data to be quantified
  • the iteration interval determination module is configured to determine a first target iteration interval according to the data variation range of the data to be quantified, so as to adjust the quantization parameter in the recurrent neural network operation according to the first target iteration interval, wherein the The target iteration interval includes at least one iteration, and the quantization parameter of the cyclic neural network is used to implement a quantization operation on the data to be quantized in the operation of the cyclic neural network.
  • the preset interval determination module is configured to adjust the quantization parameter according to the preset iteration interval when the current inspection iteration is less than or equal to the first preset iteration.
  • the iteration interval determination module is further configured to determine the first target iteration interval according to the data variation range of the data to be quantified when the current inspection iteration is greater than the first preset iteration.
  • Clause A34 The device according to any one of clauses A31 to A33, wherein the iteration interval determination module includes:
  • the second target iteration interval determination sub-module when the current inspection iteration is greater than or equal to the second preset iteration, and the current inspection iteration requires quantitative parameter adjustment, it is determined according to the first target iteration interval and the total number of iterations in each cycle.
  • the second target iteration interval corresponding to the current inspection iteration;
  • the update iteration determination sub-module determines the update iteration corresponding to the current inspection iteration according to the second target iteration interval, so as to adjust the quantization parameter in the update iteration, and the update iteration is the current inspection Iteration after iteration;
  • the second preset iteration is greater than the first preset iteration
  • the quantitative adjustment process of the cyclic neural network includes multiple cycles, and the total number of iterations in the multiple cycles is not consistent.
  • the update cycle determination sub-module determines the update cycle corresponding to the current inspection iteration according to the number of iterations in the current cycle of the current inspection iteration and the total number of iterations in the cycles after the current cycle.
  • the total number is greater than or equal to the iterative sorting number
  • the determining sub-module determines the second target iteration interval according to the first target iteration interval, the number of iterations, and the total number of iterations in the period between the current period and the update period.
  • the iteration interval determination module is further configured to determine that the current inspection iteration is greater than or equal to a second preset iteration when the degree of convergence of the cyclic neural network meets a preset condition.
  • Clause A37 The device according to Clause A34, wherein the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the quantization parameter determination module is used to determine the point position corresponding to the iteration in the reference iteration interval according to the target data bit width corresponding to the current inspection iteration and the to-be-quantized data of the current inspection iteration to adjust the points in the cyclic neural network operation position;
  • the reference iteration interval includes the second target iteration interval or the preset iteration interval.
  • Clause A38 The device according to Clause A34, wherein the quantization parameter includes a point position, and the point position is the position of a decimal point in the quantization data corresponding to the data to be quantized; the device further includes:
  • the data bit width determination module is used to determine the data bit width corresponding to the reference iteration interval according to the target data bit width corresponding to the current inspection iteration, wherein the data bit width corresponding to the iteration in the reference iteration interval is consistent, and the reference The iteration interval includes the second target iteration interval or the preset iteration interval;
  • the quantization parameter determination module is configured to adjust the point position corresponding to the iteration in the reference iteration interval according to the acquired point position iteration interval and the data bit width corresponding to the reference iteration interval, so as to adjust the point position in the neural network operation;
  • the point position iteration interval includes at least one iteration, and the iterated point positions in the point position iteration interval are consistent.
  • Clause A40 The device according to any one of clauses A37 to A39, wherein the quantization parameter further includes a scaling factor, and the scaling factor is updated synchronously with the point position.
  • Clause A42 The device according to any one of clauses A37 to A39, wherein the data bit width determination module includes:
  • the quantization error determination sub-module is used to determine the quantization error according to the quantized data of the current inspection iteration and the quantized data of the current inspection iteration, wherein the quantitative data of the current inspection iteration compares the quantization data of the current inspection iteration to the current inspection iteration. Quantitative data is obtained quantitatively;
  • the data bit width determination sub-module is used to determine the target data bit width corresponding to the current inspection iteration according to the quantization error.
  • the data bit width corresponding to the current inspection iteration is reduced to obtain the target data bit width corresponding to the current inspection iteration.
  • Clause A44 The device according to clause A43, wherein the data bit width determining unit is configured to, if the quantization error is greater than or equal to a first preset threshold, increase the data bit width corresponding to the current inspection iteration to obtain the When the target data bit width corresponding to the current inspection iteration, it is specifically used for:
  • the return execution determines the quantization error according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, until the quantization error is less than the first preset threshold; wherein, the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the first intermediate data.
  • Clause A45 The device according to clause A43, wherein the data bit width determining unit is configured to reduce the data bit width corresponding to the current inspection iteration if the quantization error is less than or equal to a second preset threshold to obtain the When describing the target data bit width corresponding to the current inspection iteration, it is specifically used for:
  • the quantized data of the current inspection iteration It is obtained by quantizing the to-be-quantized data of the current inspection iteration according to the bit width of the second intermediate data.
  • the first acquisition module is used to acquire the range of change of the point position; wherein the range of change of the point position can be used to characterize the range of data change of the data to be quantized, and the range of change of the point position corresponds to the data to be quantified
  • the data changes are positively correlated.
  • the first mean value determining unit is configured to determine the first mean value according to the point position corresponding to the previous test iteration before the current test iteration and the point position corresponding to the historical iteration before the last test iteration, wherein the last The inspection iteration is the inspection iteration corresponding to the previous iteration interval before the target iteration interval;
  • the second mean value determining unit is configured to determine the second mean value according to the point position corresponding to the current inspection iteration and the point position of the historical iteration before the current inspection iteration; wherein the point position corresponding to the current inspection iteration is determined according to the current inspection iteration.
  • the target data bit width corresponding to the current inspection iteration and the data to be quantified are determined;
  • the first error determining unit is configured to determine a first error according to the first average value and the second average value, and the first error is used to characterize the variation range of the point position.
  • the second average value is determined according to the point position of the current inspection iteration and the preset number of intermediate sliding average values.
  • Clause A49 The device according to clause A47, wherein the second average value determining unit is specifically configured to determine the second average value according to a point position corresponding to the current inspection iteration and the first average value.
  • Clause A50 The device according to clause A47, wherein the second average value determining unit is configured to update the second average value according to the acquired data bit width adjustment value of the current inspection iteration;
  • the data bit width adjustment value of the current inspection iteration is determined according to the target data bit width and the initial data bit width of the current inspection iteration.
  • the second average value is reduced according to the data bit width adjustment value of the current inspection iteration
  • the second average value is increased according to the data bit width adjustment value of the current inspection iteration.
  • Clause A52 The device according to clause A47, wherein the iteration interval determination module is configured to determine the target iteration interval according to the first error, and the target iteration interval is negatively correlated with the first error.
  • the second acquisition module is used to acquire the change trend of the data bit width; determine the data change range of the data to be quantified according to the change range of the point position and the change trend of the data bit width.
  • Clause A54 The device according to clause A53, wherein the iteration interval determination module is further configured to determine the target iteration interval according to the acquired first error and second error; wherein, the first error is used to characterize a point position The second error is used to characterize the change trend of the data bit width.
  • Clause A55 The device according to clause A53, wherein the iteration interval determination module is configured to determine the target iteration interval according to the acquired first error and second error, specifically for:
  • the target iteration interval is determined according to the target error, wherein the target error is negatively correlated with the target iteration interval.
  • Clause A56 The device according to clause A54 or 55, wherein the second error is determined according to a quantization error
  • the quantization error is determined according to the data to be quantized in the current inspection iteration and the quantized data of the current inspection iteration, and the second error is positively correlated with the quantization error.
  • the iteration interval determination module is further configured to determine the first inspection iteration according to the data variation range of the data to be quantified when the current inspection iteration is greater than or equal to the second preset iteration, and the second error is greater than the preset error value.
  • the target iteration interval is further configured to determine the first inspection iteration according to the data variation range of the data to be quantified when the current inspection iteration is greater than or equal to the second preset iteration, and the second error is greater than the preset error value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本公开涉及一种循环神经网络的量化参数调整方法、装置及相关产品,上述方法可以根据所述待量化数据的数据变动幅度,确定目标迭代间隔,以根据所述目标迭代间隔调整所述循环神经网络运算中的量化参数。本公开的循环神经网络的量化参数调整方法、装置及相关产品,能够提高循环神经网络量化过程中的精度,提高量化效率和运算效率。

Description

循环神经网络的量化参数调整方法、装置及相关产品 技术领域
本公开涉及计算机技术领域,尤其涉及一种循环神经网络的量化参数调整方法、装置及相关产品。
背景技术
随着人工智能技术的不断发展,其应用领域越来越广泛,在图像识别、语音识别、自然语言处理等领域中都得到了良好的应用。然而,随着人工智能算法的复杂度提高,待处理数据的数据量和数据维度都在不断增大,不断增大的数据量等对运算装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。
为解决上述技术问题,传统技术采用固定位宽对循环神经网络的运算数据进行量化,即将浮点型的运算数据转换为定点型的运算数据,以实现循环神经网络的运算数据的压缩。但循环神经网络的不同运算数据之间可能存在较大的差异,传统量化方法针对整个循环神经网络采用相同量化参数(如点位置)进行量化,往往会导致精度较低,影响数据运算结果。
发明内容
有鉴于此,本公开提出了一种循环神经网络的量化参数调整方法、装置及相关产品,能够提高神经网络的量化精度,保证运算结果的正确性和可靠性。
本公开提供了一种神经网络的量化参数调整方法,所述方法包括:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种循环神经网络的量化参数调整装置,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现上述任一项所述的方法的步骤。具体地,处理器执行上述计算机程序时,实现如下操作:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时,实现上述任一项所述的方法的步骤。具体地,上述计算机程序被执行时,实现如下操作:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开还提供了一种循环神经网络的量化参数调整装置,所述装置包括:
获取模块,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
本公开的循环神经网络的量化参数调整方法、装置及相关产品,通过获取待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定出第一目标迭代间隔,从而可以根据该第一目标迭代 间隔调整循环神经网络的量化参数,这样可以根据确定待量化数据的数据分布特性确定循环神经网络在不同运算阶段的量化参数。相较于现有技术中,针对同一循环神经网络的各种运算数据均采用相同的量化参数的方式,本公开的方法及装置能够提高循环神经网络量化过程中的精度,进而保证运算结果的准确性和可靠性。进一步地,通过确定目标迭代间隔还可以提高量化效率。
附图说明
本公开所涉及的附图包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。
图1示出本公开一实施例的量化参数调整方法的应用环境示意图;
图2示出本公开一实施例的待量化数据和量化数据的对应关系示意图;
图3示出本公开一实施例的待量化数据的转换示意图;
图4示出本公开一实施例的循环神经网络的量化参数调整方法的流程图;
图5a示出本公开一实施例的待量化数据在运算过程中的变动趋势图;
图5b示出本公开一实施例的循环神经网络的展开示意图;
图5c示出本公开一实施例的循环神经网络的周期示意图;
图6示出本公开一实施例的循环神经网络的参数调整方法的流程图;
图7示出本公开一实施例中点位置的变动幅度的确定方法的流程图;
图8示出本公开一实施例中第二均值的确定方法的流程图;
图9示出本公开一实施例中数据位宽调整方法的流程图;
图10示出本公开另一实施例中数据位宽调整方法的流程图;
图11示出本公开又一实施例中数据位宽调整方法的流程图;
图12示出本公开再一实施例中数据位宽调整方法的流程图;
图13示出本公开另一实施例中第二均值的确定方法的流程图;
图14示出本公开另一实施例的量化参数调整方法的流程图;
图15示出本公开一实施例的量化参数调整方法中调整量化参数的流程图;
图16示出本公开另一实施例的参数调整方法中第一目标迭代间隔的确定方法的流程图;
图17示出本公开再一实施例的量化参数调整方法的流程图;
图18示出本公开一实施例的量化参数调整装置的结构框图;
图19示出根据本公开实施例的板卡的结构框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
随着人工智能算法的复杂度提高,待处理数据的数据量和数据维度也在不断增大,而传统的循环神经网络算法通常采用浮点数据格式来执行循环神经网络运算,这就使得不断增大的数据量等对运算 装置的数据处理效率、存储装置的存储容量及访存效率等提出了较大的挑战。为解决上述问题,可以对循环神经网络运算过程涉及的运算数据进行量化,即将浮点表示的运算数据转化为定点表示的运算数据,从而减小存储装置的存储容量及访存效率,并提高运算装置的运算效率。但传统的量化方法是在循环神经网络的整个训练过程中均采用相同的数据位宽和量化参数(如小数点的位置)对循环神经网络的不同运算数据进行量化,由于不同的运算数据之间具有差异性,或者,训练过程中不同阶段的运算数据具有差异性,使得采用上述量化方法进行量化时,往往会导致精度不够,从而会影响运算结果。
基于此,本公开提供了一种循环神经网络的量化参数调整方法,该方法可以应用于包含存储器110和处理器120的量化参数调整装置中。图1为该量化参数调整装置100的结构框图,其中,该量化参数调整装置100的处理器120可以是通用处理器,该量化参数调整装置100的处理器120也可以是人工智能处理器,该量化参数调整装置100的处理器还可以包含通用处理器和人工智能处理器,此处不作具体限定。该存储器110可以用于存储循环神经网络运算过程中的运算数据,该运算数据可以是神经元数据、权值数据或梯度数据中的一种或多种。该存储器110还可以用于存储计算机程序,该计算机程序被上述的处理器120执行时,能够实现本公开实施例中的量化参数调整方法。该方法能够应用于循环神经网络的训练或微调过程中,并根据循环神经网络的训练或微调过程的不同阶段的运算数据的分布特性,动态地调整运算数据的量化参数,从而提高循环神经网络的量化过程的精度,进而保证运算结果的准确性和可靠性。
若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储器可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)或混合存储立方HMC(Hybrid Memory Cube)等等。
为更好地理解本公开的内容,以下首先介绍本公开实施例中量化过程及量化过程涉及到的量化参数。
本公开实施例中,量化是指将第一数据格式的运算数据转化为第二数据格式的运算数据。其中,该第一数据格式的运算数据可以是浮点表示的运算数据,该第二数据格式的运算数据可以是定点表示的运算数据。由于浮点表示的运算数据通常会占用较大的存储空间,因此通过将浮点表示的运算数据转换为定点表示的运算数据,可以节约存储空间,并提高运算数据的访存效率及运算效率等。
可选地,量化过程中的量化参数可以包括点位置和/或缩放系数,其中,点位置是指量化后的运算数据中小数点的位置。缩放系数是指量化数据的最大值与待量化数据的最大绝对值之间的比值。进一步地,量化参数还可以包括偏移量,偏移量是针对非对称的待量化数据而言,是指该待量化数据中多个元素的中间值,具体地,偏移量可以是待量化数据中多个元素的中点值。当该待量化数据为对称的待量化数据时,量化参数可以不包含偏移量,此时可以根据该待量化数据确定点位置和/或缩放系数等量化参数。
图2示出本公开一实施例的待量化数据和量化数据的对应关系示意图,如图2所示,待量化数据为相对于原点对称的数据,假设Z 1为待量化数据中元素的绝对值的最大值,待量化数据对应的数据位宽为n,A为用数据位宽n对待量化数据进行量化后的量化数据可以表示的最大值,A为2 s(2 n-1-1),A需要包含Z 1,且Z 1要大于
Figure PCTCN2020110142-appb-000001
因此有公式(1)的约束:
2 s(2 n-1-1)≥Z 1>2 s-1(2 n-1-1)     公式(1)
处理器可以根据待量化数据中的绝对值最大值Z1和数据位宽n,计算得到点位置s。例如,可以利用如下公式(2)计算得到待量化数据对应的点位置s:
Figure PCTCN2020110142-appb-000002
其中,ceil为向上取整,Z 1为待量化数据中的绝对值最大值,s为点位置,n为数据位宽。
此时,当采用点位置s对待量化数据进行量化时,浮点表示的待量化数据F x可以表示为:F x≈I x×2 s,其中,I x是指量化后的n位二进制表示值,s表示点位置。其中,该待量化数据对应的量化数据为:
Figure PCTCN2020110142-appb-000003
其中,s为点位置,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(3)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,根据点位置量化得到的量化数据中,小数点后的位数越多,量化数据的量化精度越大。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000004
其中,s为根据上述公式(2)确定的点位置,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,缩放系数可以包括第一缩放系数,该第一缩放系数可以按照如下方式进行计算:
Figure PCTCN2020110142-appb-000005
其中,Z1为待量化数据中的绝对值最大值,A为用数据位宽n对待量化数据进行量化后数据的量化数据可以表示的最大值,A为2 s(2 n-1-1)。
此时,处理器可以采用点位置和第一缩放系数结合的方式对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020110142-appb-000006
其中,s为根据上述公式(2)确定的点位置,f 1为第一缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(6)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000007
其中,s为根据上述公式(2)确定的点位置,f 1为缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,该缩放系数还可以包括第二缩放系数,该第二缩放系数可以按照如下方式进行计算:
Figure PCTCN2020110142-appb-000008
处理器可以单独使用第二缩放系数对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020110142-appb-000009
其中,f 2为第二缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运 算,替换公式(9)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的缩放系数,可以调整量化后数据的数值范围。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000010
其中,f 2为第二缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步地,上述第二缩放系数可以是根据点位置和第一缩放系数f 1确定。即第二缩放系数可以按照如下公式进行计算:
f 2=2 s×f 1        公式(11)
其中,s为根据上述公式(2)确定的点位置,f 1是按照上述公式(5)计算获得的第一缩放系数。
可选地,本公开实施例的量化方法不仅能够实现对称数据的量化,还可以实现对非对称数据的量化。此时,处理器可以将非对称的数据转化为对称数据,以避免数据的“溢出”。具体地,量化参数还可以包括偏移量,该偏移量可以是待量化数据的中点值,该偏移量可以用于表示待量化数据的中点值相对于原点的偏移量。图3示出本公开一实施例的待量化数据的转换示意图,如图3所示,处理器可以对待量化数据的数据分布进行统计,获得待量化数据中所有元素中的最小值Z min,以及该待量化数据中所有元素中的最大值Z max,之后处理器可以根据该最小值Z min和最大值Z max计算获得上述偏移量。具体的偏移量计算方式如下:
Figure PCTCN2020110142-appb-000011
其中,o表示偏移量,Z min表示待量化数据所有元素中的最小值,Z max表示待量化数据所有元素中的最大值。
进一步地,处理器可以根据该待量化数据所有元素中的最小值Z min和最大值Z max确定该待量化数据中的绝对值最大值Z 2
Figure PCTCN2020110142-appb-000012
这样,处理器可以根据偏移量o将待量化数据进行平移,将非对称的待量化数据转化为对称的待量化数据,如图3所示。处理器还可以根据该待量化数据中的绝对值最大值Z 2进一步确定点位置s,其中,点位置可以按照如下公式进行计算:
Figure PCTCN2020110142-appb-000013
其中,ceil为向上取整,s为点位置,n为数据位宽。
之后,处理器可以根据该偏移量及其对应的点位置对待量化数据进行量化,获得量化数据:
Figure PCTCN2020110142-appb-000014
其中,s为根据上述公式(14)确定的点位置,o为偏移量,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(15)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000015
其中,s为根据上述公式(14)确定的点位置,o为偏移量,F x为待量化数据,round为进行四舍 五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步可选地,处理器可以根据该待量化数据中的绝对值最大值Z 2进一步确定点位置s和第一缩放系数f 1,其中,点位置s具体计算方式可参见上述公式(14)。第一缩放系数f 1可以按照如下公式进行计算:
Figure PCTCN2020110142-appb-000016
处理器可以根据偏移量及其对应的第一缩放系数f 1和点位置s,对待量化数据进行量化,获得量化数据:
Figure PCTCN2020110142-appb-000017
其中,f 1为第一缩放系数,s为根据上述公式(14)确定的点位置,o为偏移量,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(18)中的四舍五入的取整运算。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000018
其中,f 1为第一缩放系数,s为根据上述公式(14)确定的点位置,o为偏移量,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
可选地,该缩放系数还可以包括第二缩放系数,该第二缩放系数可以按照如下方式进行计算:
Figure PCTCN2020110142-appb-000019
处理器可以单独使用第二缩放系数对待量化数据F x进行量化,获得量化数据:
Figure PCTCN2020110142-appb-000020
其中,f 2为第二缩放系数,I x为量化数据,F x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(21)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的缩放系数,可以调整量化后数据的数值范围。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
Figure PCTCN2020110142-appb-000021
其中,f 2为第二缩放系数,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
进一步地,上述第二缩放系数可以根据点位置和第一缩放系数f 1确定。即第二缩放系数可以按照如下公式进行计算:
f 2=2 s×f 1      公式(23)
其中,s为根据上述公式(14)确定的点位置,f 1是按照上述公式(17)计算获得的第一缩放系数。
可选地,处理器还可以根据偏移量o对待量化数据进行量化,此时点位置s和/或缩放系数可以为预 设值。此时,处理器根据偏移量对待量化数据进行量化,获得量化数据:
I x=round(F x-o)      公式(24)
其中,o为偏移量,I x为量化数据,f x为待量化数据,round为进行四舍五入的取整运算。可以理解的是,也可以采用其他的取整运算方法,例如采用向上取整、向下取整、向零取整等取整运算,替换公式(24)中的四舍五入的取整运算。可以理解的是,在数据位宽一定的情况下,采用不同的偏移量,可以调整量化后数据的数值与量化前数据之间的偏移量。
进一步地,该待量化数据对应的中间表示数据F x1可以是:
F x1=round(F x-o)+o       公式(25)
其中,o为偏移量,F x为待量化数据,round为进行四舍五入的取整运算。F x1可以是对上述的量化数据I x进行反量化获得的数据,该中间表示数据F x1的数据表示格式与上述的待量化数据F x的数据表示格式一致,该中间表示数据F x1可以用于计算量化误差,详见下文。其中,反量化是指量化的逆过程。
本公开的量化操作不仅可以用于上述浮点数据的量化,还可以用于实现定点数据的量化。可选地,该第一数据格式的运算数据也可以是定点表示的运算数据,该第二数据格式的运算数据可以是定点表示的运算数据,且第二数据格式的运算数据的数据表示范围小于第一数据格式的数据表示范围,第二数据格式的小数点位数大于第一数据格式的小数点位数,即第二数据格式的运算数据相较于第一数据格式的运算数据具有更高的精度。例如,该第一数据格式的运算数据为占用16位的定点数据,该第二数据格式可以是占用8位的定点数据。本公开实施例中,可以通过定点表示的运算数据进行量化处理,从而进一步减小运算数据占用的存储空间,提高运算数据的访存效率及运算效率。
本公开一实施例的循环神经网络的量化参数调整方法,能够应用于循环神经网络的训练或微调过程中,从而在循环神经网络的训练或微调过程中,动态地调整循环神经网络运算过程中运算数据的量化参数,以提高该循环神经网络的量化精度。其中,循环神经网络可以是深度循环神经网络或卷积循环神经网络等等,此处不作具体限定。
应当清楚的是,循环神经网络的训练(Training)是指对循环神经网络(该循环神经网络的权值可以是随机数)进行多次迭代运算(iteration),使得循环神经网络的权值能够满足预设条件的过程。其中,一次迭代运算一般包括一次正向运算、一次反向运算和一次权值更新运算。正向运算是指根据循环神经网络的输入数据进行正向推理,获得正向运算结果的过程。反向运算是根据正向运算结果与预设参考值确定损失值,并根据该损失值确定权值梯度值和/或输入数据梯度值的过程。权值更新运算是指根据权值梯度值调整循环神经网络的权值的过程。具体地,循环神经网络的训练过程如下:处理器可以采用权值为随机数的循环神经网络对输入数据进行正向运算,获得正向运算结果。之后处理器根据该正向运算结果与预设参考值确定损失值,根据该损失值确定权值梯度值和/或输入数据梯度值。最后,处理器可以根据权值梯度值更新循环神经网络的梯度值,获得新的权值,完成一次迭代运算。处理器循环执行多次迭代运算,直至循环神经网络的正向运算结果满足预设条件。例如,当循环神经网络的正向运算结果收敛于预设参考值时,结束训练。或者,当循环神经网络的正向运算结果与预设参考值确定的损失值小于或等于预设精度时,结束训练。
微调是指对循环神经网络(该循环神经网络的权值已经处于收敛状态而非随机数)进行多次迭代运算,以使得循环神经网络的精度能够满足预设需求的过程。该微调过程与上述训练过程基本一致,可以认为是对处于收敛状态的循环神经网络进行重训练的过程。推理(Inference)是指采用权值满足预设条件的循环神经网络进行正向运算,从而实现识别或分类等功能的过程,如采用循环神经网络进行图像识别等等。
本公开实施例中,在上述循环神经网络的训练或微调过程中,可以在循环神经网络运算的不同阶段采用不同的量化参数对循环神经网络的运算数据进行量化,并根据量化后的数据进行迭代运算,从而可以减小循环神经网络运算过程中的数据存储空间,提高数据访存效率及运算效率。图4示出本公开一实施例的循环神经网络的量化参数调整方法的流程图。如图4所示,上述方法可以包括步骤S100至步骤S200。
在步骤S100中,获取待量化数据的数据变动幅度。
可选地,处理器可以直接读取该待量化数据的数据变动幅度,该待量化数据的数据变动幅度可以是用户输入的。
可选地,处理器也可以根据当前检验迭代的待量化数据和历史迭代的待量化数据,计算获得上述 的待量化数据的数据变动幅度,其中当前检验迭代是指当前执行的迭代运算,历史迭代是指在当前检验迭代之前执行的迭代运算。例如,处理器可以获取当前检验迭代的待量化数据中元素的最大值以及元素的平均值,以及各个历史迭代的待量化数据中元素的最大值以及元素的平均值,并根据各次迭代中元素的最大值和元素的平均值,确定待量化数据的变动幅度。若当前检验迭代的待量化数据中元素的最大值与预设数量的历史迭代的待量化数据中元素的最大值较为接近,且当前检验迭代的待量化数据中元素的平均值与预设数量的历史迭代的待量化数据中元素的平均值较为接近时,则可以确定上述的待量化数据的数据变动幅度较小。否则,则可以确定待量化数据的数据变动幅度较大。再如,该待量化数据的数据变动幅度可以采用待量化数据的滑动平均值或方差等进行表示,此处不作具体限定。
本公开实施例中,该待量化数据的数据变动幅度可以用于确定是否需要调整待量化数据的量化参数。例如,若待量化数据的数据变动幅度较大,则可以说明需要及时调整量化参数,以保证量化精度。若待量化数据的数据变动幅度较小,则当前检验迭代及其之后一定数量的迭代可以沿用历史迭代的量化参数,从而可以避免频繁的调整量化参数,提高量化效率。
其中,每次迭代涉及至少一个待量化数据,该待量化数据可以是浮点表示的运算数据,也可以是定点表示的运算数据。可选地,每次迭代的待量化数据可以是神经元数据、权值数据或梯度数据中的至少一种,梯度数据还可以包括神经元梯度数据和权值梯度数据等。
在步骤S200中,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
可选地,该量化参数可以包括上述的点位置和/或缩放系数,其中,缩放系数可以包括第一缩放系数和第二缩放系数。具体的点位置计算方法可以参见上述的公式(2),缩放系数的计算方法可参见上述的公式(5)或(8),此处不再赘述。可选地,该量化参数还可以包括偏移量,该偏移量的计算方法可参见上述的公式(12);更进一步地,处理器还可以根据按照公式(14)确定点位置,根据上述的公式(17)或(20)确定缩放系数。本申请实施例中,处理器可以根据确定的目标迭代间隔,更新上述的点位置、缩放系数或偏移量中的至少一种,以调整该循环神经网络运算中的量化参数。也就是说,该循环神经网络运算中的量化参数可以根据循环神经网络运算中待量化数据的数据变动幅度进行更新,从而可以保证量化精度。
可以理解的是,通过对循环神经网络训练或微调过程中的运算数据的变化趋势进行统计和分析,可以得到待量化数据的数据变动曲线。图5a示出本公开一实施例的待量化数据在运算过程中的变动趋势图,如图5a所示,根据该数据变动曲线可以获知,在循环神经网络训练或微调的初期,不同迭代的待量化数据的数据变动较为剧烈,随着训练或微调运算的进行,不同迭代的待量化数据的数据变动逐渐趋于平缓。因此,在循环神经网络训练或微调的初期,可以较为频繁地调整量化参数;在循环神经网络训练或微调的中期和后期,可以间隔多次迭代或周期再调整量化参数。本公开的方法即是通过确定合适的迭代间隔,以取得量化精度和量化效率的平衡。
具体地,处理器可以通过待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据该第一目标迭代间隔调整循环神经网络运算中的量化参数。可选地,该第一目标迭代间隔可以随着待量化数据的数据变动幅度的减小而增大。也就是说,该待量化数据的数据变动幅度越大时,则该第一目标迭代间隔越小,表明量化参数的调整越频繁。该待量化数据的数据变动幅度越小时,则该第一目标迭代间隔越大,表明量化参数的调整越不频繁。当然,在其他实施例中,上述的第一目标迭代间隔还可以是超参数,例如,该第一目标迭代间隔可以是用户自定义设置的。
可选地,上述的权值数据、神经元数据及梯度数据等各种待量化数据可以分别具有的不同的迭代间隔。相应地,处理器可以分别获取各种待量化数据对应的数据变动幅度,以分别根据每种待量化数据的数据变动幅度,确定相应种类的待量化数据对应的第一目标迭代间隔。也就是说,各种待量化数据的量化过程可以是异步进行的。本公开实施例中,由于不同种类的待量化数据之间具有差异性,因此可以采用不同的待量化数据的数据变动幅度,确定相应的第一目标迭代间隔,并分别根据相应的第一目标迭代间隔确定对应的量化参数,从而可以保证待量化数据的量化精度,进而保证循环神经网络的运算结果的正确性。
当然,在其他实施例中,针对不同种类的待量化数据还可以确定相同的目标迭代间隔(包括第一目标迭代间隔、预设迭代间隔、第二目标迭代间隔中的任一个),以根据该目标迭代间隔调整相应待量化数据对应的量化参数。例如,处理器可以分别获取各种待量化数据的数据变动幅度,并根据最大的待量化数据的数据变动幅度确定目标迭代间隔,并根据该目标迭代间隔分别确定各种待量化数据的量化参数。更进一步地,不同种类的待量化数据还可以采用相同的量化参数。
进一步可选地,上述的循环神经网络可以包括至少一个运算层,该待量化数据可以是各个运算层涉及的神经元数据、权值数据或梯度数据中的至少一种。此时,处理器可以获得当前运算层涉及的待量化数据,并根据上述方法确定当前运算层中各种待量化数据的数据变动幅度及对应的第一目标迭代间隔。
可选地,处理器可以在每次迭代运算过程中均确定一次上述的待量化数据的数据变动幅度,并根据相应的待量化数据的数据变动幅度确定一次第一目标迭代间隔。也就是说,处理器可以在每次迭代均计算一次第一目标迭代间隔。具体的第一目标迭代间隔的计算方式可参见下文中的描述。进一步地,处理器可以根据预设条件从各次迭代中选定检验迭代,在各次检验迭代处确定待量化数据的变动幅度,并根据检验迭代对应的第一目标迭代间隔对量化参数等的更新调整。此时,若该迭代不是选定检验迭代,处理器可以忽略该迭代对应的第一目标迭代间隔。
可选地,每个目标迭代间隔可以对应一检验迭代,该检验迭代可以是该目标迭代间隔的起始迭代,也可以是该目标迭代间隔的终止迭代。处理器可以在各个目标迭代间隔的检验迭代处调整循环神经网络的量化参数,以实现根据目标迭代间隔调整循环神经网络运算的量化参数。其中,检验迭代可以是用于核查当前量化参数是否满足待量化数据的需求的时间点。该调整前的量化参数可以与调整后的量化参数相同,也可以与调整后的量化参数不同。可选地,相邻的检验迭代之间的间隔可以大于或等于一个目标迭代间隔。
例如,该目标迭代间隔可以从当前检验迭代开始计算迭代数量,该当前检验迭代可以是该目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定目标迭代间隔的迭代间隔为3,则处理器可以确定该目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。处理器可以在该第100次迭代处调整循环神经网络运算中的量化参数。其中,当前检验迭代是处理器当前执行量化参数更新调整时对应的迭代运算。
可选地,目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,该当前检验迭代可以是当前检验迭代之前的上一迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定目标迭代间隔的迭代间隔为3,则处理器可以确定该目标迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。处理器可以在该第100次迭代和第103次迭代处调整循环神经网络运算中的量化参数。本公开对目标迭代间隔的确定方式不作具体限定。
图5b示出本公开一实施例的循环神经网络的展开示意图。如图5b所示,给出了循环神经网络的隐藏层的展开示意图,t-1,t,t+1表示时间序列。X表示输入的样本。St表示样本在时间t处的记忆,St=f(W*St-1+U*Xt)。W表示输入的权重,U表示此刻输入的样本的权重,V表示输出的样本权重。由于不同的循环神经网络展开的层数不同,不同在对循环神经网络进行量化参数更新时,不同周期中所包含的迭代总数是不同的。图5c示出本公开一实施例的循环神经网络的周期示意图。如图5c所示,iter 1、iter 2、iter 3、iter 4为循环神经网络的三个周期,其中,第一个周期iter 1中包括t 0、t 1、t 2、t 3四个迭代。第二个周期iter 2中包括t 0、t 1两个迭代。第三个周期iter 3中包括t 0、t 1、t 2三个迭代。第四个周期iter 2中包括t 0、t 1、t 2、t 3、t 4五个迭代。在计算循环神经网络在何时能够更新量化参数时,需要结合不同周期中迭代的总数进行计算。
在一个实施例中,根据上文中点位置、缩放系数及偏移量的计算公式可以看出,量化参数往往与待量化数据相关,因此,上述操作S100中,待量化数据的数据变动幅度也可以通过量化参数的变动幅度间接确定,该待量化数据的数据变动幅度可以通过量化参数的变动幅度进行表征。具体地,图6示出本公开一实施例的循环神经网络的参数调整方法的流程图,如图6所示,上述操作S100可以包括操作S110,操作S200可以包括操作S210(详见下文描述)。
S110、获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
可选地,点位置的变动幅度能够间接反应待量化数据的变动幅度。该点位置的变动幅度可以是根据当前检验迭代的点位置和至少一次历史迭代的点位置确定的。其中,当前检验迭代的点位置及各次历史迭代的点位置可以根据公式(2)进行确定。当然,当前检验迭代的点位置及各次历史迭代的点位置还可以根据公式(14)进行确定。
例如,处理器还可以计算当前检验迭代的点位置和历史迭代的点位置的方差等,并根据该方差确定点位置的变动幅度。再如,处理器可以根据当前检验迭代的点位置和历史迭代的点位置的平均值,确定点位置的变动幅度。具体地,如图7所示,上述操作S110可以包括操作S111至操作S113,操作S210可以包括操作S211(详见下文描述)。
S111、根据所述当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值。其中,上一检验迭代为上一次调整所述量化参数时对应的迭代,上一检验迭代与所述当前检验迭代之间间隔至少一个迭代间隔。
可选地,至少一次历史迭代可以分属于至少一个迭代间隔中,每个迭代间隔可以对应有一个检验迭代,相邻的两个检验迭代可以具有一个迭代间隔。上述操作S111中的上一检验迭代可以是目标迭代间隔之前的上一迭代间隔对应的检验迭代。
可选地,该第一均值可以按照如下公式进行计算:
M1=a1×s t-1+a2×s t-2+a3×s t-3+…+am×s 1    公式(26)
其中,a1~am是指各次迭代的点位置对应的计算权重,s t-1是指上一检验迭代对应的点位置,s t-2,s t-3…s 1是指上一检验迭代之前的历史迭代对应的点位置,M1是指上述的第一均值。进一步地,根据数据的分布特性,历史迭代与该上一检验迭代距离越远,对该上一检验迭代附近的迭代的点位置的分布及变动幅度影响越小,因此,上述计算权重可以按照a1~am的顺序依次减小。
例如,上一检验迭代为循环神经网络运算的第100次迭代,历史迭代可以是第1次迭代至第99次迭代,则处理器可以获得该第100次迭代的点位置(即s t-1),并获得该第100次迭代之前的历史迭代的点位置,即s 1可以指循环神经网络的第1次迭代对应的点位置……,s t-3可以指循环神经网络的第98次迭代对应的点位置,s t-2可以指循环神经网络的第99次迭代对应的点位置。进一步地,处理器可以根据上述公式计算获得第一均值。
更进一步地,该第一均值可以根据各个迭代间隔对应的检验迭代的点位置进行计算。例如,该第一均值可以按照如下公式进行计算:
M1=a1×s t-1+a2×s t-2+a3×s t-3+…+am×s 1
其中,a1~am是指各次检验迭代的点位置对应的计算权重,s t-1是指上一检验迭代对应的点位置,s t-2,s t-3…s 1是指上一检验迭代之前的预设数量的迭代间隔的检验迭代对应的点位置,M1是指上述的第一均值。
例如,上一检验迭代为循环神经网络运算的第100次迭代,历史迭代可以是第1次迭代至第99次迭代,该99次历史迭代可以分属于11个迭代间隔。比如,第1次迭代至第9次迭代属于第1个迭代间隔,第10次迭代至第18次迭代属于第2个迭代间隔,……,第90次迭代至第99次迭代属于第11个迭代间隔。则处理器可以获得该第100次迭代的点位置(即s t-1),并获得该第100次迭代之前的迭代间隔中检验迭代的点位置,即s 1可以指循环神经网络的第1个迭代间隔的检验迭代对应的点位置(比如s 1可以指循环神经网络的第1次迭代对应的点位置),……,s t-3可以指循环神经网络的第10个迭代间隔的检验迭代对应的点位置(比如s t-3可以指循环神经网络的第81次迭代对应的点位置),s t-2可以指循环神经网络的第11个迭代间隔的检验迭代对应的点位置(比如,s t-2可以指循环神经网络的第90次迭代对应的点位置)。进一步地,处理器可以根据上述公式计算获得第一均值M1。
本公开实施例中,为方便举例说明,假定该迭代间隔包含的迭代数量相同。而在实际使用过程中,如图5c所示,该循环神经网络中迭代间隔包含的迭代数量不相同。可选地,该迭代间隔包含的迭代数量随迭代的增加而增加,即随着循环神经网络训练或微调的进行,迭代间隔可以越来越大。
再进一步地,为进一步简化计算,降低数据占用的存储空间,上述第一均值M1可以采用如下公式进行计算:
M1=α×s t-1+(1-α)×M0       公式(27)
其中,α是指上一检验迭代对应的点位置的计算权重,s t-1是指上一检验迭代对应的点位置,M0是指该上一检验迭代之前的检验迭代对应的滑动平均值,该M0的具体计算方式可参照上述的M1的计算方式,此处不再赘述。
S112、根据当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值。其中,当前检验迭代对应的点位置可以根据当前检验迭代的目标数据位宽和待量化数据确定。
可选地,该第二均值M2可以按照如下公式进行计算:
M2=b1×s t+b2×s t-1+b3×s t-2+…+bm×s 1    公式(28)
其中,b1~bm是指各次迭代的点位置对应的计算权重,s t是指当前检验迭代对应的点位置,s t-1,s t-2…s 1是指当前检验迭代之前的历史迭代对应的点位置,M2是指上述的第二均值。进一步地,根据数据的分布特性,历史迭代与该当前检验迭代距离越远,对该当前检验迭代附近的迭代的点位置的分布及变动幅度影响越小,因此,上述计算权重可以按照b1~bm的顺序依次减小。
例如,当前检验迭代为循环神经网络运算的第101次迭代,该当前检验迭代之前的历史迭代是指第1次迭代至第100次迭代。则处理器可以获得该第101次迭代的点位置(即s t),并获得该第101次迭 代之前的历史迭代的点位置,即s 1可以指循环神经网络的第1次迭代对应的点位置……,s t-2可以指循环神经网络的第99次迭代对应的点位置,s t-1可以指循环神经网络的第100次迭代对应的点位置。进一步地,处理器可以根据上述公式计算获得第二均值M2。
可选地,该第二均值可以根据各个迭代间隔对应的检验迭代的点位置进行计算。具体地,图8示出本公开一实施例中第二均值的确定方法的流程图,如图8所示,上述操作S112可以包括如下操作:
S1121、获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定,所述检验迭代为调整所述神经网络量化过程中的参数时对应的迭代;
S1122、根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
例如,该第二均值可以按照如下公式进行计算:
M2=b1×s t+b2×s t-1+b3×s t-2+…+bm×s 1
其中,b1~bm是指各次迭代的点位置对应的计算权重,s t是指当前检验迭代对应的点位置,s t-1,s t-2…s 1是指当前检验迭代之前的检验迭代对应的点位置,M2是指上述的第二均值。
例如,当前检验迭代为第100次迭代,历史迭代可以是第1次迭代至第99次迭代,该99次历史迭代可以分属于11个迭代间隔。比如,第1次迭代至第9次迭代属于第1个迭代间隔,第10次迭代至第18次迭代属于第2个迭代间隔,……,第90次迭代至第99次迭代属于第11个迭代间隔。则处理器可以获得该第100次迭代的点位置(即s t),并获得该第100次迭代之前的迭代间隔中检验迭代的点位置,即s 1可以指循环神经网络的第1个迭代间隔的检验迭代对应的点位置(比如s 1可以指循环神经网络的第1次迭代对应的点位置),……,s t-2可以指循环神经网络的第10个迭代间隔的检验迭代对应的点位置(比如s t-2可以指循环神经网络的第81次迭代对应的点位置),s t-1可以指循环神经网络的第11个迭代间隔的检验迭代对应的点位置(比如,s t-1可以指循环神经网络的第90次迭代对应的点位置)。进一步地,处理器可以根据上述公式计算获得第二均值M2。
本公开实施例中,为方便举例说明,假定该迭代间隔包含的迭代数量相同。而在实际使用过程中,该迭代间隔包含的迭代数量可以不相同。可选地,该迭代间隔包含的迭代数量随迭代的增加而增加,即随着循环神经网络训练或微调的进行,迭代间隔可以越来越大。
更进一步地,为简便计算,降低数据占用的存储空间,处理器可以根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值,即上述第二均值可以采用如下公式进行计算:
M2=β×s t+(1-β)×M1      公式(29)
其中,β是指当前检验迭代对应的点位置的计算权重,M1是指上述的第一均值。
S113、根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述当前检验迭代及所述历史迭代的点位置的变动幅度。
可选地,第一误差可以等于第二均值与上述的第一均值之间的差值的绝对值。具体地,上述的第一误差可以按照如下公式进行计算:
diff update1=|M2-M1|=β|s (t)-M1|      公式(30)
可选地,上述的当前检验迭代的点位置可以根据当前检验迭代的待量化数据和当前检验迭代对应的目标数据位宽确定,具体的点位置计算方式可以参见上文的公式(2)或公式(14)。其中,上述当前检验迭代对应的目标数据位宽可以是超参数。进一步可选地,该当前检验迭代对应的目标数据位宽可以是用户自定义输入的。可选地,在循环神经网络训练或微调过程中的待量化数据对应的数据位宽可以是不变的,即同一循环神经网络的同种待量化数据采用同一数据位宽进行量化,例如,针对该循环神经网络在各次迭代中的神经元数据均采用8比特的数据位宽进行量化。
可选地,循环神经网络训练或微调过程中的待量化数据对应的数据位宽为可变的,以保证数据位宽能够满足待量化数据的量化需求。也就是说,处理器可以根据待量化数据,自适应的调整该待量化数据对应的数据位宽,获得该待量化数据对应的目标数据位宽。具体地,处理器可以首先确定当前检验迭代对应的目标数据位宽,之后,处理器可以根据该当前检验迭代对应的目标数据位宽及该当前检验迭代对应的待量化数据,确定当前检验迭代对应的点位置。
图9示出本公开一实施例中数据位宽调整方法的流程图,如图9所示,上述操作S110可以包括:
S114、根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据是通过对所述当前检验迭代的待量化数据进行量化获得。
可选地,上述处理器可以采用初始数据位宽对待量化数据进行量化,获得上述的量化数据。该当前检验迭代的初始数据位宽可以是超参数,该当前检验迭代的初始数据位宽也可以是根据该当前检验 迭代之前的上一检验迭代的待量化数据确定的。
具体地,处理器可以根据当前检验迭代的待量化数据和当前检验迭代的量化数据,确定中间表示数据。可选地,所述中间表示数据与上述的待量化数据的表示格式一致。例如,处理器可以对上述的量化数据进行反量化,获得与待量化数据的表示格式一致的中间表示数据,其中,反量化是指量化的逆过程。例如,该量化数据可以采用上述公式(3)获得,处理器还可以按照上述公式(4)对量化数据进行反量化,获得相应的中间表示数据,并根据待量化数据和中间表示数据确定量化误差。
进一步地,处理器可以根据待量化数据及其对应的中间表示数据计算获得量化误差。设当前检验迭代的待量化数据为F x=[z 1,z 2…,z m],该待量化数据对应的中间表示数据为F x1=[z 1 (n),z 2 (n)…,z m (n)]。处理器可以根据该待量化数据F x及其对应的中间表示数据F x1确定误差项,并根据该误差项确定量化误差。
可选地,处理器可以根据中间表示数据F x1中各元素的和,以及待量化数据F x中各元素的和确定上述的误差项,该误差项可以是中间表示数据F x1中各元素的和与待量化数据F x中各元素的和的差值。之后,处理器可以根据该误差项确定量化误差。具体的量化误差可以按照如下公式进行确定:
Figure PCTCN2020110142-appb-000022
其中,z i为待量化数据中的元素,z i (n)为中间表示数据F x1的元素。
可选地,处理器可以分别计算待量化数据中各元素与中间表示数据F x1中相应元素的差值,获得m个差值,并将该m个差值的和作为误差项。之后,处理器可以根据该误差项确定量化误差。具体的量化误差可以按照如下公式确定:
Figure PCTCN2020110142-appb-000023
其中,z i为待量化数据中的元素,z i (n)为中间表示数据F x1的元素。
可选地,上述待量化数据中各元素与中间表示数据F x1中相应元素的差值可以约等于2 s-1,因此,上述量化误差还可以按照如下公式确定:
Figure PCTCN2020110142-appb-000024
其中,m为目标数据对应的中间表示数据F x1的数量,s为点位置,z i为待量化数据中的元素。
可选地,所述中间表示数据也可以与上述的量化数据的数据表示格式一致,并根据该中间表示数据和量化数据确定量化误差。例如,待量化数据可以表示为:F x≈I x×2 s,则可以确定出中间表示数据
Figure PCTCN2020110142-appb-000025
该中间表示数据I x1可以与上述的量化数据具有相同的数据表示格式。此时处理器可以根据中间表示数据I x1和上述公式(3)计算获得的
Figure PCTCN2020110142-appb-000026
确定量化误差。具体的量化误差确定方式可参照上述的公式(31)~公式(33)。
S115、根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
具体地,处理器可以根据该量化误差,自适应地调整当前检验迭代对应的数据位宽,确定该当前检验迭代调整后的目标数据位宽。当该量化误差满足预设条件时,则可以保持当前检验迭代对应的数据位宽不变,即该当前检验迭代的目标数据位宽可以等于初始数据位宽。当量化误差不满足预设条件时,处理器可以调整当前检验迭代的待量化数据对应的数据位宽,获得当前检验迭代对应的目标数据位宽。当处理器采用该目标数据位宽对当前检验迭代的待量化数据进行量化时,量化误差满足上述的预设条件。可选地,上述的预设条件可以是用户设置的预设阈值。
可选地,图10示出本公开另一实施例中数据位宽调整方法的流程图,如图10所示,上述操作S115可以包括:
S1150、处理器可以判断上述的量化误差是否大于或等于第一预设阈值。
若所述量化误差大于或等于第一预设阈值,则可以执行操作S1151,增大所述当前检验迭代对应的数据位宽,获得当前检验迭代的目标数据位宽。当量化误差小于第一预设阈值时,则可以保持当前检验迭代的数据位宽不变。
进一步可选地,处理器可以经过一次调整获得上述的目标数据位宽。例如,当前检验迭代的初始数据位宽为n1,处理器可以经一次调整确定该目标数据位宽n2=n1+t,其中,t为数据位宽的调整值。其中,采用该目标数据位宽n2对当前检验迭代的待量化数据进行量化时,获得的量化误差可以小于所述第一预设阈值。
进一步可选地,处理器可以经过多次调整获得目标数据位宽,直至量化误差小于第一预设阈值,并将该量化误差小于第一预设阈值时的数据位宽作为目标数据位宽。具体地,若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;之后处理器可以根据该第一中 间数据位宽对当前检验迭代的待量化数据进行量化,获得量化数据,并根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值。处理器可以将该量化误差小于第一预设阈值时对应的数据位宽,作为该目标数据位宽。
例如,当前检验迭代的初始数据位宽为n1,处理器可以采用该初始数据位宽n1对当前检验迭代的待量化数据A进行量化,获得量化数据B1,并根据该待量化数据A和量化数据B1计算获得量化误差C1。在量化误差C1大于或等于第一预设阈值时,处理器确定第一中间数据位宽n2=n1+t1,其中,t1为第一预设位宽步长。之后,处理器可以根据该第一中间数据位宽n2对当前检验迭代的待量化数据进行量化,获得当前检验迭代的量化数据B2,并根据该待量化数据A和量化数据B2计算获得量化误差C2。若该量化误差C2大于或等于第一预设阈值时,处理器确定第一中间数据位宽n2=n1+t1+t1,之后根据该新的第一中间数据位宽对当前检验迭代的待量化数据A进行量化,并计算相应的量化误差,直至量化误差小于第一预设阈值。若量化误差C1小于第一预设阈值,则可以保持该初始数据位宽n1不变。
更进一步地,上述的第一预设位宽步长可以是恒定值,例如,每当量化误差大于第一预设阈值时,则处理器可以将当前检验迭代对应的数据位宽增大相同的位宽值。可选地,上述的第一预设位宽步长也可以是可变值,例如,处理器可以计算量化误差与第一预设阈值的差值,若该量化误差与第一预设阈值的差值越小,则第一预设位宽步长的值越小。
可选地,图11示出本公开又一实施例中数据位宽调整方法的流程图,如图11所示,上述操作S115还可以包括:
S1152、处理器可以判断上述的量化误差是否小于或等于第一预设阈值。
若所述量化误差小于或等于第二预设阈值,则可以执行操作S1153,减小所述当前检验迭代对应的数据位宽,获得当前检验迭代的目标数据位宽。当量化误差大于第二预设阈值时,则可以保持当前检验迭代的数据位宽不变。
进一步可选地,处理器可以经过一次调整获得上述的目标数据位宽。例如,当前检验迭代的初始数据位宽为n1,处理器可以经一次调整确定该目标数据位宽n2=n1-t,其中,t为数据位宽的调整值。其中,采用该目标数据位宽n2对当前检验迭代的待量化数据进行量化时,获得的量化误差可以大于所述第二预设阈值。
进一步可选地,处理器可以经过多次调整获得目标数据位宽,直至量化误差大于第二预设阈值,并将该量化误差大于第二预设阈值时的数据位宽作为目标数据位宽。具体地,若所述量化误差小于或等于第一预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;之后处理器可以根据该第二中间数据位宽对当前检验迭代的待量化数据进行量化,获得量化数据,并根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值。处理器可以将该量化误差大于第二预设阈值时对应的数据位宽,作为该目标数据位宽。
例如,当前检验迭代的初始数据位宽为n1,处理器可以采用该初始数据位宽n1对当前检验迭代的待量化数据A进行量化,获得量化数据B1,并根据该待量化数据A和量化数据B1计算获得量化误差C1。在量化误差C1小于或等于第二预设阈值时,处理器确定第二中间数据位宽n2=n1-t2,其中,t2为第二预设位宽步长。之后,处理器可以根据该第二中间数据位宽n2对当前检验迭代的待量化数据进行量化,获得当前检验迭代的量化数据B2,并根据该待量化数据A和量化数据B2计算获得量化误差C2。若该量化误差C2小于或等于第二预设阈值时,处理器确定第二中间数据位宽n2=n1-t2-t2,之后根据该新的第二中间数据位宽对当前检验迭代的待量化数据A进行量化,并计算相应的量化误差,直至量化误差大于第二预设阈值。若量化误差C1大于第二预设阈值,则可以保持该初始数据位宽n1不变。
更进一步地,上述的第二预设位宽步长可以是恒定值,例如,每当量化误差小于第二预设阈值时,则处理器可以将当前检验迭代对应的数据位宽减小相同的位宽值。可选地,上述的第二预设位宽步长也可以是可变值,例如,处理器可以计算量化误差与第二预设阈值的差值,若该量化误差与第二预设阈值的差值越小,则第二预设位宽步长的值越小。
可选地,图12示出本公开再一实施例中数据位宽调整方法的流程图,如图12所示,当处理器确定量化误差小于第一预设阈值,且量化误差大于第二预设阈值时,可以保持当前检验迭代的数据位宽不变,其中,第一预设阈值大于第二预设阈值。即当前检验迭代的目标数据位宽可以等于初始数据位宽。其中,图12中仅以举例的方式说明本公开一实施例的数据位宽确定方式,图12中各个操作的顺序可以适应性调整,此处并不作具体限定。
本公开实施例中,由于当前检验迭代的数据位宽发生变化时,会相应的带来点位置的变化。但此时点位置的变化并非是待量化数据的数据变动引起的,根据上述公式(30)确定的第一误差计算获得的目标迭代间隔可能不准确,从而会影响量化的精度。因此,在当前检验迭代的数据位宽发生变化时, 可以相应的调整上述的第二均值,以保证第一误差能够准确的反映点位置的变动幅度,进而保证目标迭代间隔的准确性和可靠性。具体地,图13示出本公开另一实施例中第二均值的确定方法的流程图,如图13所示,上述方法还可以包括:
S116、根据所述目标数据位宽,确定所述当前检验迭代的数据位宽调整值;
具体地,处理器可以根据当前检验迭代的目标数据位宽和初始数据位宽,确定当前检验迭代的数据位宽调整值。其中,该数据位宽调整值=目标数据位宽-初始数据位宽。当然,处理器还可以直接获得当前检验迭代的数据位宽调整值。
S117、根据所述当前检验迭代的数据位宽调整值,更新上述的第二均值。
具体地,若数据位宽调整值大于预设参数(例如,该预设参数可以等于零)时,即当前检验迭代的数据位宽增加时,处理器可以相应地减少第二均值。若数据位宽调整值小于预设参数(例如,该预设参数可以等于零)时,即当前检验迭代的数据位宽减少时,处理器可以相应地增加第二均值。若数据位宽调整值等于预设参数,即当数据位宽调整值等于0时,此时当前检验迭代对应的待量化数据未发生改变,更新后的第二均值等于更新前的第二均值,该更新前的第二均值根据上述公式(29)计算获得。可选地,若数据位宽调整值等于预设参数,即当数据位宽调整值等于0时,处理器可以不更新第二均值,即处理器可以不执行上述操作S117。
例如,更新前的第二均值M2=β×s t+(1-β)×M1;当前检验迭代对应的目标数据位宽n2=初始数据位宽n1+Δn时,其中,Δn表示数据位宽调整值。此时,更新后的第二均值M2=β×(s t-Δn)+(1-β)×(M1-Δn)。当前检验迭代对应的目标数据位宽n2=初始数据位宽n1-Δn时,其中,Δn表示数据位宽调整值,此时,更新后的第二均值M2=β×(s t-Δn)+(1-β)×(M1+Δn),其中,s t是指当前检验迭代是根据目标数据位宽确定的点位置。
再如,更新前的第二均值M2=β×s t+(1-β)×M1;当前检验迭代对应的目标数据位宽n2=初始数据位宽n1+Δn时,其中,Δn表示数据位宽调整值。此时,更新后的第二均值M2=β×s t+(1-β)×M1-Δn。再如,当前检验迭代对应的目标数据位宽n2=初始数据位宽n1-Δn时,其中,Δn表示数据位宽调整值,此时,更新后的第二均值M2=β×s t+(1-β)×M1+Δn,其中,s t是指当前检验迭代是根据目标数据位宽确定的点位置。
进一步地,如图6所示,上述操作S200可以包括:
S210、根据点位置的变动幅度,确定第一目标迭代间隔,其中,该第一目标迭代间隔与上述的点位置的变动幅度负相关。即上述的点位置的变动幅度越大,则该第一目标迭代间隔越小。上述的点位置的变动幅度越小,则该第一目标迭代间隔越大。
如上所述,上述的第一误差可以表征点位置的变动幅度,因此,如图7所示,上述操作S210可以包括:
S211、处理器可以根据所述第一误差确定所述第一目标迭代间隔,其中,第一目标迭代间隔与所述第一误差负相关。即第一误差越大,则说明点位置的变化幅度越大,进而表明待量化数据的数据变动幅度越大,此时,第一目标迭代间隔越小。
具体地,处理器可以根据以下公式计算得到第一目标迭代间隔I:
Figure PCTCN2020110142-appb-000027
其中,I为第一目标迭代间隔,diff update1表示上述的第一误差,δ和γ可以为超参数。
可以理解的是,第一误差可以用于衡量点位置的变动幅度,第一误差越大,表明点位置的变动幅度越大,进而说明待量化数据的数据变动幅度越大,第一目标迭代间隔需要设置得越小。也就是说,第一误差越大,量化参数的调整越频繁。
在本实施例中,通过计算点位置的变动幅度(第一误差),并根据点位置的变动幅度确定第一目标迭代间隔。由于量化参数根据第一目标迭代间隔确定,也就使得根据量化参数进行量化得到的量化数据,能够更加符合目标数据的点位置的变动趋势,在保证量化精度的同时,提高循环神经网络的运行效率。
可选地,处理器在当前检验迭代处确定第一目标迭代间隔后,可以进一步在当前检验迭代处确定第一目标迭代间隔对应的量化参数和数据位宽等参数,从而根据第一目标迭代间隔更新量化参数。其中,量化参数可以包括点位置和/或缩放系数。进一步地,该量化参数还可以包括偏移量。该量化参 数的具体计算方式可参见上文中的描述。图14示出本公开另一实施例的量化参数调整方法的流程图,如图14所示,上述方法还可以包括:
S300、处理器根据第一目标迭代间隔调整循环神经网络运算中的量化参数。
具体地,处理器可以根据第一目标迭代间隔以及每个周期中迭代的总数确定更新迭代(亦可称检验迭代),并在各个更新迭代处更新第一目标迭代间隔,还可以在各个更新迭代处更新量化参数。例如,循环神经网络运算中的数据位宽保持不变,此时,处理器可以在各个更新迭代处直接根据更新迭代的待量化数据,调整点位置等量化参数。再如,循环神经网络运算中的数据位宽可变,此时,处理器可以在各个更新迭代处更新数据位宽,并根据更新后的数据位宽和该更新迭代的待量化数据,调整点位置等量化参数。
本公开实施例中,处理器在各个检验迭代处更新量化参数,以保证当前量化参数满足待量化数据的量化需求。其中,更新前的第一目标迭代间隔与更新后的第一目标迭代间隔可以相同,也可以不同。更新前的数据位宽与更新后的数据位宽可以相同,也可以不同;即不同迭代间隔的数据位宽可以相同,也可以不同。更新前的量化参数与更新后的量化参数可以相同,也可以不同;即不同迭代间隔的量化参数可以相同,也可以不同。
可选地,上述操作S300中,处理器可以在更新迭代处确定第一目标迭代间隔中的量化参数,以调整循环神经网络运算中的量化参数。
在一种可能的实现方式中,当该方法用于循环神经网络的训练或微调过程中时,操作S200可以包括:
处理器确定当前检验迭代是否大于第一预设迭代,其中,在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
其中,当前检验迭代是指处理器当前执行的迭代运算。可选地,该第一预设迭代可以是超参数,该第一预设迭代可以是根据待量化数据的数据变动曲线确定的,该第一预设迭代也可以是用户自定义设置的。可选地,该第一预设迭代可以小于一个周期(epoch)包含的迭代总数,其中,一个周期是指数据集中的所有待量化数据均完成一次正向运算和一次反向运算。
可选地,处理器可以读取用户输入的第一预设迭代,并根据该第一预设迭代与预设迭代间隔的对应关系,确定预设迭代间隔。可选地,该预设迭代间隔可以是超参数,该预设迭代间隔也可以是用户自定义设置的。此时,处理器可以直接读取用户输入的第一预设迭代和预设迭代间隔,并根据该预设迭代间隔更新循环神经网络运算中的量化参数。本公开实施例中,处理器无需根据待量化数据的数据变动幅度,确定目标迭代间隔。
例如,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为5,则在当前检验迭代小于或等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每间隔5次迭代更新一次量化参数。具体地,处理器可以确定第1次迭代对应的数据位宽n1及点位置s1等量化参数,并采用该数据位宽n1和点位置s1等量化参数对第1次迭代至第5次迭代的待量化数据进行量化,即第1次迭代至第5次迭代可以采用相同的量化参数。之后,处理器可以确定第6次迭代对应的数据位宽n2及点位置s2等量化参数,并采用该数据位宽n2和点位置s2等量化参数对第6次迭代至第10次迭代的待量化数据进行量化,即第6次迭代至第10次迭代可以采用相同的量化参数。同理,处理器可以按照上述量化方式直至完成第100次迭代。其中,每个迭代间隔中数据位宽及点位置等量化参数的确定方式可以参见上文的描述,此处不再赘述。
再如,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为1,则在当前检验迭代小于或等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每次迭代均更新量化参数。具体地,处理器可以确定第1次迭代对应的数据位宽n1及点位置s1等量化参数,并采用该数据位宽n1和点位置s1等量化参数对第1次迭代的待量化数据进行量化。之后,处理器可以确定第2次迭代对应的数据位宽n2及点位置s2等量化参数,并采用该数据位宽n2和点位置s2等量化参数对第2次迭代的待量化数据进行量化,……。同理,处理器可以确定出第100次迭代的数据位宽n100以及点位置s100等量化参数,并采用该数据位宽n100和点位置s100等量化参数对第100次迭代的待量化数据进行量化。其中,每个迭代间隔中数据位宽及点位置等量化参数的确定方式可以参见上文的描述,此处不再赘述。
上文仅以数据位宽和量化参数同步更新的方式举例说明,在其他可选的实施例中,在每个目标迭代间隔中,处理器还可以根据点位置的变动幅度确定点位置的迭代间隔,并根据该点位置迭代间隔更新点位置等量化参数。
可选地,在当前检验迭代大于第一预设迭代时,可以表明循环神经网络的训练或微调处于中期阶段,此时可以获得历史迭代的待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定第一目标迭代间隔,该第一目标迭代间隔可以大于上述的预设迭代间隔,从而可以减少量化参数的更新次数,提高量化效率及运算效率。具体地,在所述当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
承接上例,用户输入的第一预设迭代为第100次迭代,预设迭代间隔为1,则在当前检验迭代小于或等于第100次迭代时,可以根据预设迭代间隔更新量化参数。即处理器可以确定在循环神经网络的训练或微调的第1次迭代至第100次迭代,每次迭代均更新量化参数,具体实现方式可以参见上文中的描述。在当前检验迭代大于第100次迭代时,处理器可以根据当前检验迭代的待量化数据及其之前的历史迭代的待量化数据,确定待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定第一目标迭代间隔。具体地,在当前检验迭代大于第100次迭代时,处理器可以自适应地调整当前检验迭代对应的数据位宽,获得该当前检验迭代对应的目标数据位宽,并将该当前检验迭代对应的目标数据位宽作为第一目标迭代间隔的数据位宽,其中,第一目标迭代间隔中迭代对应的数据位宽一致。同时,处理器可以根据当前检验迭代对应的目标数据位宽和待量化数据,确定当前检验迭代对应的点位置,并根据当前检验迭代对应的点位置确定第一误差。处理器还可以根据当前检验迭代对应的待量化数据,确定量化误差,并根据量化误差确定第二误差。之后,处理器可以根据第一误差和第二误差确定第一目标迭代间隔,该第一目标迭代间隔可以大于上述的预设迭代间隔。进一步地,处理器可以确定第一目标迭代间隔中的点位置或缩放系数等量化参数,具体确定方式可参见上文中的描述。
例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第一目标迭代间隔的迭代间隔为3,则处理器可以确定该第一目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。处理器还可以根据第100次迭代的待量化数据确定量化误差,并根据量化误差确定第二误差和第100次迭代对应的目标数据位宽,将该目标数据位宽作为第一目标迭代间隔对应的数据位宽,其中,第100次迭代、第101次迭代和第102次迭代对应的数据位宽均为该第100次迭代对应的目标数据位宽。处理器还可以根据该第100次迭代的待量化数据和该第100次迭代对应的目标数据位宽确定该第100次迭代对应的点位置和缩放系数等量化参数。之后,采用该第100次迭代对应的量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。
在一种可能的实现方式中,操作S200还可以包括:
在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
在当前检验迭代大于第一预设迭代时,处理器可以进一步确定当前检验迭代是否大于第二预设迭代。其中,所述第二预设迭代大于所述第一预设迭代,所述第二预设迭代间隔大于所述预设迭代间隔。可选地,上述第二预设迭代可以是超参数,第二预设迭代可以大于至少一个周期的迭代总数。可选地,第二预设迭代可以根据待量化数据的数据变动曲线确定。可选地,第二预设迭代也可以是用户自定义设置的。
在一种可能的实现方式中,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔,包括:
根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
举例来说,如图5c所示,假定在第一目标迭代周期I=1。在第一周期iter 1的t 1迭代中确定需要更新量化参数,则第一周期iter 1的t 2迭代所对应的下一个更新迭代可以为第二周期iter 2中的t 1迭代。在第 一周期iter 1的t 2迭代中确定需要更新量化参数,由于第一周期iter 1的t 2迭代的迭代排序数3大于第二周期的迭代总数,则第一周期iter 1的t 2迭代所对应的下一个更新迭代会变为第三周期iter 3中的t 2迭代。在第一周期iter 1的t 3迭代中确定需要更新量化参数,由于第一周期iter 1的t 2迭代的迭代排序数4大于第二周期、第三周期的迭代总数,则第一周期iter 1的t 3迭代所对应的下一个更新迭代会变为第四周期iter 4中的t 3迭代。
这样,处理器可以根据预设迭代间隔和第二目标迭代间隔对量化参数和第一目标迭代间隔进行更新,为便于描述本文中对实际进行量化参数和第一目标迭代间隔更新的预设迭代间隔和第二目标迭代间隔称为参考迭代间隔或目标迭代间隔。
在一种情况下,该循环神经网络运算中的各次迭代对应的数据位宽均不发生变化,即该循环神经网络运算中各次迭代对应的数据位宽均相同,此时,处理器可以通过确定参考迭代间隔中的点位置等量化参数,实现根据参考迭代间隔对循环神经网络运算中的量化参数的调整的目的。其中,该参考迭代间隔中迭代对应的量化参数可以是一致的。也就是说,参考迭代间隔中的各次迭代均采用同一点位置,仅在各次检验迭代处更新确定点位置等量化参数,从而可以避免每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,针对上述数据位宽不变的情况,参考迭代间隔中迭代对应的点位置可以保持一致。具体地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点位置,并将该当前检验迭代对应的点位置作为该参考迭代间隔对应的点位置,该参考迭代间隔中迭代均沿用当前检验迭代对应的点位置。可选地,该当前检验迭代对应的目标数据位宽可以是超参数。例如,该当前检验迭代对应的目标数据位宽是由用户自定义输入的。该当前检验迭代对应的点位置可以参见上文的公式(2)或公式(14)计算。
在一种情况下,该循环神经网络运算中的各次迭代对应的数据位宽可以发生变化,即不同参考迭代间隔对应的数据位宽可以不一致,但参考迭代间隔中各次迭代的数据位宽保持不变。其中,该参考迭代间隔中迭代对应的数据位宽可以是超参数,例如,该参考迭代间隔中迭代对应的数据位宽可以是用户自定义输入的。在一种情况下,该参考迭代间隔中迭代对应的数据位宽也可以是处理器计算获得的,例如,处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的目标数据位宽,并将该当前检验迭代对应的目标数据位宽作为参考迭代间隔对应的数据位宽。
此时,为简化量化过程中的计算量,该参考迭代间隔中对应的点位置等量化参数也可以保持不变。也就是说,参考迭代间隔中的各次迭代均采用同一点位置,仅在各次检验迭代处更新确定点位置等量化参数以及数据位宽,从而可以避免每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,针对上述参考迭代间隔对应的数据位宽不变的情况,参考迭代间隔中迭代对应的点位置可以保持一致。具体地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点位置,并将该当前检验迭代对应的点位置作为该参考迭代间隔对应的点位置,该参考迭代间隔中迭代均沿用当前检验迭代对应的点位置。可选地,该当前检验迭代对应的目标数据位宽可以是超参数。例如,该当前检验迭代对应的目标数据位宽是由用户自定义输入的。该当前检验迭代对应的点位置可以参见上文的公式(2)或公式(14)计算。
可选地,参考迭代间隔中迭代对应的缩放系数可以一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的缩放系数,并将该当前检验迭代对应的缩放系数作为参考迭代间隔中各次迭代的缩放系数。其中,该参考迭代间隔中迭代对应的缩放系数一致。
可选地,参考迭代间隔中迭代对应的偏移量一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的偏移量,并将该当前检验迭代对应的偏移量作为参考迭代间隔中各次迭代的偏移量。进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。该参考迭代间隔中迭代对应的偏移量一致。
例如,该参考迭代间隔可以从当前检验迭代开始计算迭代数量,即参考迭代间隔对应的检验迭代可以是参考迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为3,则处理器可以确定该参考迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。这样,处理器在第101次迭代和第102次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
可选地,参考迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该参考迭代 间隔对应的检验迭代也可以是该参考迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为3。则处理器可以确定该参考迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第101次迭代、第102次迭代和第103次迭代进行量化。这样,处理器在第102次迭代和第103次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
本公开实施例中,同一参考迭代间隔中各次迭代对应的数据位宽及量化参数均一致,即同一参考迭代间隔中各次迭代对应的数据位宽、点位置、缩放系数及偏移量均保持不变,从而在循环神经网络的训练或微调过程中,可以避免频繁地调整待量化数据的量化参数,减少了量化过程中的计算量,从而可以提高量化效率。并且,通过在训练或微调的不同阶段根据数据变动幅度,动态地调整量化参数,可以保证量化精度。
在另一情况下,该循环神经网络运算中的各次迭代对应的数据位宽可以发生变化,但参考迭代间隔中各次迭代的数据位宽保持不变。此时,参考迭代间隔中迭代对应的点位置等量化参数也可以不一致。处理器还可以根据当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,参考迭代间隔中迭代对应的数据位宽一致。之后,处理器可以根据该参考迭代间隔对应的数据位宽和点位置迭代间隔,调整循环神经网络运算过程中的点位置等量化参数。可选地,图15示出本公开一实施例的量化参数调整方法中调整量化参数的流程图,如图15所示,上述操作S300还可以包括:
S310、根据当前检验迭代的待量化数据,确定参考迭代间隔对应的数据位宽;其中,该参考迭代间隔中迭代对应的数据位宽一致。也就是说,循环神经网络运算过程中的数据位宽每隔一个参考迭代间隔更新一次。可选地,该参考迭代间隔对应的数据位宽可以为当前检验迭代的目标数据位宽。该当前检验迭代的目标数据位宽可参见上文中的操作S114和S115,此处不再赘述。
例如,该参考迭代间隔可以从当前检验迭代开始计算迭代数量,即参考迭代间隔对应的检验迭代可以是参考迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为6,则处理器可以确定该参考迭代间隔包括6次迭代,分别为第100次迭代至第105次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至第105次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至第105次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定参考迭代间隔,以及更新数据位宽的操作。
可选地,参考迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该参考迭代间隔对应的检验迭代也可以是该参考迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定参考迭代间隔的迭代间隔为6。则处理器可以确定该参考迭代间隔包括6次迭代,分别为第101次迭代至第106次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至106次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至106次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定参考迭代间隔,以及更新数据位宽的操作。
S320、处理器根据获取的点位置迭代间隔和所述参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置等量化参数。
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。可选地,该点位置迭代间隔可以是超参数,例如,该点位置迭代间隔可以是用户自定义输入的。
可选地,所述点位置迭代间隔小于或等于所述参考迭代间隔。当该点位置迭代间隔与上述的参考迭代间隔相同时,处理器可以在当前检验迭代处同步更新数据位宽和点位置等量化参数。进一步可选地,参考迭代间隔中迭代对应的缩放系数可以一致。更进一步地,参考迭代间隔中迭代对应的偏移量一致。此时,该参考迭代间隔中的迭代对应的数据位宽和点位置等量化参数均相同,从而可以降低计算量,提高量化效率和运算效率。具体实现过程与上述实施例基本一致,可参见上文的描述,此处不再赘述。
当点位置迭代间隔小于上述的参考迭代间隔时,处理器可以在参考迭代间隔对应的检验迭代处更新数据位宽和点位置等量化参数,并在该点位置迭代间隔确定的子检验迭代处更新点位置等量化参数。由于在数据位宽不变的情况下,点位置等量化参数可以根据待量化数据进行微调,因此,在同一参考迭代间隔内也可以对点位置等量化参数进行调整,以进一步提高量化精度。
具体地,处理器可以根据当前检验迭代和点位置迭代间隔确定子检验迭代,该子检验迭代用于调 整点位置,该子检验迭代可以是参考迭代间隔中的迭代。进一步地,处理器可以根据子检验迭代的待量化数据和参考迭代间隔对应的数据位宽,调整参考迭代间隔中迭代对应的点位置,其中,点位置的确定方式可以参照上述的公式(2)或公式(14),此处不再赘述。
例如,当前检验迭代为第100次迭代,该参考迭代间隔为6,该参考迭代间隔包含的迭代为第100次迭代至第105次迭代。处理器获取的点位置迭代间隔为I s1=3,则可以从当前检验迭代开始间隔三次迭代调整一次点位置。具体地,处理器可以将第100次迭代作为上述的子检验迭代,并计算获得该第100次迭代对应的点位置s1,在第100次迭代、第101次迭代和第102次迭代共用点位置s1进行量化。之后,处理器可以根据点位置迭代间隔I s1将第103次迭代作为上述的子检验迭代,同时处理器还可以根据第103次迭代对应的待量化数据和参考迭代间隔对应的数据位宽n确定第二个点位置迭代间隔对应的点位置s2,则在第103次迭代至第105次迭代中可以共用上述的点位置s2进行量化。本公开实施例中,上述的更新前的点位置s1与更新后的点位置s2的值可以相同,也可以不同。进一步地,处理器可以在第106次迭代重新根据待量化数据的数据变动幅度,确定下一参考迭代间隔以及该下一参考迭代间隔对应的数据位宽及点位置等量化参数。
再如,当前检验迭代为第100次迭代,该参考迭代间隔为6,该参考迭代间隔包含的迭代为第101次迭代至第106次迭代。处理器获取的点位置迭代间隔为I s1=3,则可以从当前检验迭代开始间隔三次迭代调整一次点位置。具体地,处理器可以根据当前检验迭代的待量化数据和当前检验迭代对应的目标数据位宽n1,确定第一个点位置迭代间隔对应的点位置为s1,则在第101次迭代、第102次迭代和103次迭代共用上述的点位置s1进行量化。之后,处理器可以根据点位置迭代间隔I s1将第104次迭代作为上述的子检验迭代,同时处理器还可以根据第104次迭代对应的待量化数据和参考迭代间隔对应的数据位宽n1确定第二个点位置迭代间隔对应的点位置s2,则在第104次迭代至第106次迭代中可以共用上述的点位置s2进行量化。本公开实施例中,上述的更新前的点位置s1与更新后的点位置s2的值可以相同,也可以不同。进一步地,处理器可以在106次迭代重新根据待量化数据的数据变动幅度,确定下一参考迭代间隔以及该下一参考迭代间隔对应的数据位宽及点位置等量化参数。
可选地,该点位置迭代间隔可以等于1,即每次迭代均更新一次点位置。可选地,该点位置迭代间隔可以相同,也可以不同。例如,该参考迭代间隔包含的至少一个点位置迭代间隔可以是依次增大的。此处仅以举例的说明本实施例的实现方式,并不用于限定本公开。
可选地,该参考迭代间隔中迭代对应的缩放系数也可以不一致。进一步可选地,该缩放系数可以与上述的点位置同步更新,也就是说,该缩放系数对应的迭代间隔可以等于上述的点位置迭代间隔。即每当处理器更新确定点位置时,会相应地更新确定缩放系数。
可选地,该参考迭代间隔中迭代对应的偏移量也可以不一致。进一步地,该偏移量可以与上述的点位置同步更新,也就是说,该偏移量对应的迭代间隔可以等于上述的点位置迭代间隔。即每当处理器更新确定点位置时,会相应地更新确定偏移量。当然,该偏移量也可以与上述地点位置或数据位宽异步更新,此处不作具体限定。更进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。
在另一种实施例中,处理器可以根据点位置的变动幅度和待量化数据的数据位宽的变化,综合确定待量化数据的数据变动幅度,并根据该待量化数据的数据变动幅度确定参考迭代间隔,其中,该参考迭代间隔可以用于更新确定数据位宽,即处理器可以在每个参考迭代间隔的检验迭代处更新确定数据位宽。由于点位置可以反映定点数据的精度,数据位宽可以反映定点数据的数据表示范围,因而通过综合点位置的变动幅度和待量化数据的数据位宽变化,可以保证量化后的数据既能够兼顾精度,也能够满足数据表示范围。可选地,点位置的变化幅度可以采用上述的第一误差进行表征,数据位宽的变化可以根据上述的量化误差进行确定。具体地,图16示出本公开另一实施例的参数调整方法中第一目标迭代间隔的确定方法的流程图,如图16所示,上述方法可以包括:
S400、获取第一误差,第一误差能够表征点位置的变动幅度,该点位置的变动幅度可以表示待量化数据的数据变动幅度;具体地,上述第一误差的计算方式可参见上文中操作S110中的描述,此处不再赘述。
S500、获取第二误差,所述第二误差用于表征所述数据位宽的变化。
可选地,上述的第二误差可以根据量化误差进行确定,该第二误差与上述的量化误差正相关。在一种可能的实现方式中,上述操作S500可以包括:
根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差;
根据所述量化误差确定所述第二误差,所述第二误差与所述量化误差正相关。
其中,所述当前检验迭代的量化数据根据初始数据位宽对所述当前检验迭代的待量化数据进行量 化获得。其中,具体的量化误差确定方式可参见上文中操作S114中的描述,此处不再赘述。
具体地,第二误差可以按照如下公式进行计算:
diff update2=θ*diff bit 2    公式(34)
其中,diff update2表示上述的第二误差,diff bit表示上述的量化误差,θ可以为超参数。
S600、根据所述第二误差和所述第一误差,确定所述第一目标迭代间隔。
具体地,处理器可以根据第一误差和第二误差计算获得目标误差,并根据目标误差确定目标迭代间隔。可选地,目标误差可以是第一误差和第二误差进行加权平均计算获得。例如,目标误差=K*第一误差+(1-K)*第二误差,其中,K为超参数。之后,处理器可以根据该目标误差确定目标迭代间隔,目标迭代间隔与该目标误差负相关。即目标误差越大,目标迭代间隔越小。
可选地,该目标误差还可以根据第一误差和第二误差中的最值进行确定,此时第一误差或第二误差的权重取值为0。在一种可能的实现方式中,上述操作S600可以包括:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述第一目标迭代间隔,其中,所述目标误差与所述第一目标迭代间隔负相关。
具体地,处理器可以比较第一误差diff update1和第二误差diff update2的大小,当第一误差diff update1大于第二误差diff update2时,则该目标误差等于第一误差diff update1。当第一误差diff update1小于第二误差时,则该目标误差等于第二误差diff update2。当第一误差diff update1等于第二误差时,则该目标误差可以时第一误差diff update1或第二误差diff update2。即目标误差diff update可以按照如下公式进行确定:
diff update=max(diff update1,diff update2)     公式(35)
其中,diff update是指目标误差,diff update1是指第一误差,diff update2是指第二误差。
具体地,第一目标迭代间隔可以按照如下方式进行确定,
可以根据以下公式计算得到第一目标迭代间隔:
Figure PCTCN2020110142-appb-000028
其中,I表示目标迭代间隔,diff update表示上述的目标误差,δ和γ可以为超参数。
可选地,上述实施例中,循环神经网络运算中数据位宽可变,并可以通过第二误差衡量数据位宽的变化趋势。此种情况下,处理器在确定第一目标迭代间隔后,可以确定第二目标迭代间隔以及确定第二目标迭代间隔中迭代对应的数据位宽,其中,该第二目标迭代间隔中迭代对应的数据位宽一致。具体地,处理器可以根据当前检验迭代的待量化数据,确定第二目标迭代间隔对应的数据位宽。也就是说,循环神经网络运算过程中的数据位宽每隔一个第二目标迭代间隔更新一次。可选地,该第二目标迭代间隔对应的数据位宽可以为当前检验迭代的目标数据位宽。该当前检验迭代的目标数据位宽可参见上文中的操作S114和S115,此处不再赘述。
例如,该第二目标迭代间隔可以从当前检验迭代开始计算迭代数量,即第二目标迭代间隔对应的检验迭代可以是第二目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为6,则处理器可以确定该第二目标迭代间隔包括6次迭代,分别为第100次迭代至第105次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至第105次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至第105次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定第二目标迭代间隔,以及更新数据位宽的操作。
可选地,第二目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该第二目标迭代间隔对应的检验迭代也可以是该第二目标迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为6。则处理器可以确定该第二目标迭代间隔包括6次迭代,分别为第101次迭代至第106次迭代。此时,处理器可以确定第100次迭代的目标数据位宽,且第101次迭代至106次迭代沿用该第100次迭代的目标数据位宽,无需在第101次迭代至106次迭代计算目标数据位宽,从而减少计算量,提高量化效率及运算效率。之后,第106次迭代可以作为当前检验迭代,并重复上述确定目标迭代间隔,以及更新数据位宽的操作。
再进一步地,处理器还可以在检验迭代处确定第二目标迭代间隔中的量化参数,以根据第二目标迭代间隔调整循环神经网络运算中的量化参数。即该循环神经网络运算中的点位置等量化参数可以与数据位宽同步更新。
在一种情况下,该第二目标迭代间隔中迭代对应的量化参数可以是一致的。可选地,处理器可以根据当前检验迭代的待量化数据和该当前检验迭代对应的目标数据位宽,确定当前检验迭代对应的点 位置,并将该当前检验迭代对应的点位置作为该第二目标迭代间隔对应的点位置,其中该第二目标迭代间隔中迭代对应的点位置一致。也就是说,第二目标迭代间隔中的各次迭代均沿用当前检验迭代的点位置等量化参数,避免了每次迭代都对量化参数进行更新调整,从而减少了量化过程中的计算量,提高了量化操作的效率。
可选地,第二目标迭代间隔中迭代对应的缩放系数可以一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的缩放系数,并将该当前检验迭代对应的缩放系数作为第二目标迭代间隔中各次迭代的缩放系数。其中,该第二目标迭代间隔中迭代对应的缩放系数一致。
可选地,第二目标迭代间隔中迭代对应的偏移量一致。处理器可以根据当前检验迭代的待量化数据,确定当前检验迭代对应的偏移量,并将该当前检验迭代对应的偏移量作为第二目标迭代间隔中各次迭代的偏移量。进一步地,处理器还可以确定待量化数据所有元素中的最小值和最大值,并进一步确定点位置和缩放系数等量化参数,具体可参见上文中的描述。该第二目标迭代间隔中迭代对应的偏移量一致。
例如,该第二目标迭代间隔可以从当前检验迭代开始计算迭代数量,即第二目标迭代间隔对应的检验迭代可以是第二目标迭代间隔的起始迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为3,则处理器可以确定该第二目标迭代间隔包括3次迭代,分别为第100次迭代、第101次迭代和第102次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第100次迭代、第101次迭代和第102次迭代进行量化。这样,处理器在第101次迭代和第102次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
可选地,第二目标迭代间隔还可以是从当前检验迭代的下一次迭代开始计算迭代数量,即该第二目标迭代间隔对应的检验迭代也可以是该第二目标迭代间隔的终止迭代。例如,当前检验迭代为第100次迭代,处理器根据待量化数据的数据变动幅度,确定第二目标迭代间隔的迭代间隔为3。则处理器可以确定该第二目标迭代间隔包括3次迭代,分别为第101次迭代、第102次迭代和第103次迭代。进而处理器可以根据第100次迭代对应的待量化数据和目标数据位宽,确定该第100次迭代对应的点位置等量化参数,并可以采用该第100次迭代对应的点位置等量化参数对第101次迭代、第102次迭代和第103次迭代进行量化。这样,处理器在第102次迭代和第103次迭代无需计算点位置等量化参数,减少了量化过程中的计算量,提高了量化操作的效率。
本公开实施例中,同一第二目标迭代间隔中各次迭代对应的数据位宽及量化参数均一致,即同一第二目标迭代间隔中各次迭代对应的数据位宽、点位置、缩放系数及偏移量均保持不变,从而在循环神经网络的训练或微调过程中,可以避免频繁地调整待量化数据的量化参数,减少了量化过程中的计算量,从而可以提高量化效率。并且,通过在训练或微调的不同阶段根据数据变动幅度,动态地调整量化参数,可以保证量化精度。
在另一种情况下,处理器还可以根据点位置等量化参数对应的点位置迭代间隔确定第二目标迭代间隔中的量化参数,以根据调整循环神经网络运算中的量化参数。即该循环神经网络运算中的点位置等量化参数可以与数据位宽异步更新,处理器可以在第二目标迭代间隔的检验迭代处更新数据位宽和点位置等量化参数,处理器还可以根据点位置迭代间隔单独更新第二目标迭代间隔中迭代对应的点位置。
具体地,处理器还可以根据当前检验迭代对应的目标数据位宽,确定第二目标迭代间隔对应的数据位宽,其中,第二目标迭代间隔中迭代对应的数据位宽一致。之后,处理器可以根据该第二目标迭代间隔对应的数据位宽和点位置迭代间隔,调整循环神经网络运算过程中的点位置等量化参数。在确定第二目标迭代间隔对应的数据位宽之后,根据获取的点位置迭代间隔和所述第二目标迭代间隔对应的数据位宽,调整所述第二目标迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置。其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。可选地,该点位置迭代间隔可以是超参数,例如,该点位置迭代间隔可以是用户自定义输入的。
在一种可选的实施例中,上述的方法可以用于循环神经网络的训练或微调过程中,以实现对循环神经网络微调或训练过程涉及的运算数据的量化参数进行调整,以提高循环神经网络运算过程中涉及的运算数据的量化精度及效率。该运算数据可以是神经元数据、权值数据或梯度数据中的至少一种。如图5a所示,根据待量化数据的数据变动曲线可知,在训练或微调的初期阶段,各次迭代的待量化数据之间的差异性较大,待量化数据的数据变动幅度较为剧烈,此时可以目标迭代间隔的值可以较小,以及时地更新目标迭代间隔中的量化参数,保证量化精度。在训练或微调的中期阶段,待量化数据的 数据变动幅度逐渐趋于平缓,此时可以增大目标迭代间隔的值,以避免频繁地更新量化参数,以提高量化效率及运算效率。在训练或微调的后期阶段,此时循环神经网络的训练或微调趋于稳定(即当循环神经网络的正向运算结果趋近于预设参考值时,该循环神经网络的训练或微调趋于稳定),此时可以继续增大目标迭代间隔的值,以进一步提高量化效率及运算效率。基于上述数据变动趋势,可以在循环神经网络的训练或微调的不同阶段采用不同的方式确定目标迭代间隔,以在保证量化精度的基础上,提高量化效率及运算效率。
进一步地,图17示出本公开再一实施例的量化参数调整方法的流程图,如图17所示,上述方法还可以包括:
在当前迭代大于第一预设迭代时,处理器还可以执行操作S712,即处理器可以进一步确定当前迭代是否大于第二预设迭代。其中,所述第二预设迭代大于所述第一预设迭代,所述第二预设迭代间隔大于所述第一预设迭代间隔。可选地,上述第二预设迭代可以是超参数,第二预设迭代可以大于至少一个训练周期的迭代总数。可选地,第二预设迭代可以根据待量化数据的数据变动曲线确定。可选地,第二预设迭代也可以是用户自定义设置的。
在所述当前迭代大于或等于第二预设迭代时,则处理器可以执行操作S714,将第二预设迭代间隔作为所述目标迭代间隔,并根据所述第二预设迭代间隔调整所述神经网络量化过程中的参数。在当前迭代大于第一预设迭代,且当前迭代小于第二预设迭代时,则处理器可以执行上述的操作S713,根据所述待量化数据的数据变动幅度确定目标迭代间隔,并根据所述目标迭代间隔调整量化参数。
可选地,处理器可以读取用户设置的第二预设迭代,并根据第二预设迭代与第二预设迭代间隔的对应关系,确定第二预设迭代间隔,该第二预设迭代间隔大于第一预设迭代间隔。可选地,当所述神经网络的收敛程度满足预设条件时,则确定所述当前迭代大于或等于第二预设迭代。例如,在当前迭代的正向运算结果趋近于预设参考值时,可以确定该神经网络的收敛程度满足预设条件,此时可以确定当前迭代大于或等于第二预设迭代。或者,在当前迭代对应的损失值小于或等于预设阈值时,则可以确定该神经网络的收敛程度满足预设条件。
可选地,上述的第二预设迭代间隔可以是超参数,该第二预设迭代间隔可以大于或等于至少一个训练周期的迭代总数。可选地,该第二预设迭代间隔可以是用户自定义设置的。处理器可以直接读取用户输入的第二预设迭代和第二预设迭代间隔,并根据该第二预设迭代间隔更新神经网络运算中的量化参数。例如,该第二预设迭代间隔可以等于一个训练周期的迭代总数,即每个训练周期(epoch)更新一次量化参数。
再进一步地,上述方法还包括:
当所述当前迭代大于或等于第二预设迭代,处理器还可以在每次检验迭代处确定当前数据位宽是否需要调整。如果当前数据位宽需要调整,则处理器可以从上述的操作S714切换至操作S713,以重新确定数据位宽,使得数据位宽能够满足待量化数据的需求。
具体地,处理器可以根据上述的第二误差确定数据位宽是否需要调整。处理器还可以执行上述操作S715,确定第二误差是否大于预设误差值,当所述当前迭代大于或等于第二预设迭代且所述第二误差大于预设误差值时,则切换执行操作S713,根据所述待量化数据的数据变动幅度确定迭代间隔,以根据所述迭代间隔重新确定所述数据位宽。若当前迭代大于或等于第二预设迭代,且第二误差小于或等于预设误差值,则继续执行操作S714,将第二预设迭代间隔作为所述目标迭代间隔,并根据所述第二预设迭代间隔调整所述神经网络量化过程中的参数。其中,预设误差值可以是根据量化误差对应的预设阈值确定的,当第二误差大于预设误差值时,此时说明数据位宽可能需要进一步调整,处理器可以根据所述待量化数据的数据变动幅度确定迭代间隔,以根据所述迭代间隔重新确定所述数据位宽。
例如,第二预设迭代间隔为一个训练周期的迭代总数。在当前迭代大于或等于第二预设迭代时,处理器可以按照第二预设迭代间隔更新量化参数,即每个训练周期(epoch)更新一次量化参数。此时,每个训练周期的起始迭代作为一个检验迭代,在每个训练周期的起始迭代处,处理器可以根据该检验迭代的待量化数据确定量化误差,根据量化误差确定第二误差,并根据如下公式确定第二误差是否大于预设误差值:
diff update2=θ*diff bit 2>T
其中,diff update2表示第二误差,diff bit表示量化误差,θ表示超参数,T表示预设误差值。可选地,该预设误差值可以等于第一预设阈值除以超参数。当然,该预设误差值也可以是超参数。例如,该预设误差值可以按照如下公式计算获得:T=th/10,其中,th表示第一预设阈值,超参数的取值为10。
若第二误差diff update2大于预设误差值T,则说明数据位宽可能不能满足预设要求,此时,可以不再采用第二预设迭代间隔更新量化参数,处理器可以按照待量化数据的数据变动幅度确定目标迭代间 隔,以保证数据位宽满足预设要求。即在第二误差diff update2大于预设误差值T时,处理器从上述的操作S714切换至上述的操作S713。
当然,在其他实施例中,处理器可以根据上述的量化误差,确定数据位宽是否需要调整。例如,第二预设迭代间隔为一个训练周期的迭代总数。在当前检验迭代大于或等于第二预设迭代时,处理器可以按照第二预设迭代间隔更新量化参数,即每个训练周期(epoch)更新一次量化参数。其中,每个训练周期的起始迭代作为一个检验迭代。在每个训练周期的起始迭代处,处理器可以根据该检验迭代的待量化数据确定量化误差,并在该量化误差大于或等于第一预设阈值时,则说明数据位宽可能不能满足预设要求,即处理器从上述的操作S714切换至上述的操作S713。
在一个可选的实施例中,上述的点位置、缩放系数和偏移量等量化参数可以通过显示装置进行显示。此时,用户可以通过显示装置获知循环神经网络运算过程中的量化参数,用户还可以自适应修改处理器确定的量化参数。同理,上述的数据位宽和目标迭代间隔等也可以通过显示装置进行显示。此时,用户可以通过显示装置获知循环神经网络运算过程中的目标迭代间隔和数据位宽等参数,用户还可以自适应修改处理器确定的目标迭代间隔和数据位宽等参数。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
本公开一实施例还提供了一种循环神经网络的量化参数调整装置200,该量化参数调整装置200可以设置于一处理器中。例如,该量化参数调整装置200可以置于通用处理器中,再如,该量化参数调整装置也可以置于一人工智能处理器中。图18示出本公开一实
获取模块210,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块220,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
在一种可能的实现方式中,所述装置还包括:
预设间隔确定模块,用于在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
在一种可能的实现方式中,所述迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
更新迭代确定子模块,根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
在一种可能的实现方式中,所述第二目标迭代间隔确定子模块,包括:
更新周期确定子模块,根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
确定子模块,根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
在一种可能的实现方式中,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
量化参数确定模块,用于根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
在一种可能的实现方式中,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
数据位宽确定模块,用于根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
量化参数确定模块,用于根据获取的点位置迭代间隔和参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
在一种可能的实现方式中,所述点位置迭代间隔小于或等于所述参考迭代间隔。
在一种可能的实现方式中,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
在一种可能的实现方式中,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
在一种可能的实现方式中,所述数据位宽确定模块包括:
量化误差确定子模块,用于根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
数据位宽确定子模块,用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
在一种可能的实现方式中,所述数据位宽确定单元用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
在一种可能的实现方式中,所述数据位宽确定单元用于若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
在一种可能的实现方式中,所述数据位宽确定单元用于若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
在一种可能的实现方式中,所述获取模块包括:
第一获取模块,用于获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
在一种可能的实现方式中,所述第一获取模块包括:
第一均值确定单元,用于根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述目标迭代间隔之前的上一迭代间隔对应的检验迭代;
第二均值确定单元,用于根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
第一误差确定单元,用于根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
在一种可能的实现方式中,所述第二均值确定单元具体用于:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
在一种可能的实现方式中,所述第二均值确定单元具体用于根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
在一种可能的实现方式中,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;
其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
在一种可能的实现方式中,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值时,具体用于:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
在一种可能的实现方式中,所述迭代间隔确定模块用于根据所述第一误差确定所述目标迭代间隔,所述目标迭代间隔与所述第一误差负相关。
在一种可能的实现方式中,所述获取模块还包括:
第二获取模块,用于获取数据位宽的变化趋势;根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
在一种可能的实现方式中,所述迭代间隔确定模块还用于根据获取的第一误差和第二误差,确定所述目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
在一种可能的实现方式中,所述迭代间隔确定模块用于根据获取的第一误差和第二误差,确定所述目标迭代间隔时,具体用于:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述目标迭代间隔,其中,所述目标误差与所述目标迭代间隔负相关。
在一种可能的实现方式中,所述第二误差根据量化误差确定;
其中,所述量化误差根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
在一种可能的实现方式中,所述迭代间隔确定模块,还用于当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
应当清楚的是,本申请实施例各个模块或单元的工作原理与上述方法中各个操作的实现过程基本一致,具体可参见上文的描述,此处不再赘述。应该理解,上述的装置实施例仅是示意性的,本公开 的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在一个实施例中,本公开还提供了一种计算机可读存储介质,该存储介质中存储有计算机程序,该计算机程序被处理器或装置执行时,实现如上述任一实施例中的方法。具体地,该计算机程序被处理器或装置执行时,实现如下方法:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定目标迭代间隔,以根据所述目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
应当清楚的是,本申请实施例各个操作的实现与上述方法中各个操作的实现过程基本一致,具体可参见上文的描述,此处不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述量化参数调整装置。
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
图19示出根据本公开实施例的板卡的结构框图,参阅图19,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。 比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE 3.0 X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
依照以下条款可以更好地理解本公开的内容:
条款A1.一种循环神经网络的量化参数调整方法,所述方法包括:
获取待量化数据的数据变动幅度;
根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
条款A2.根据条款A1所述的方法,所述方法还包括:
在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
条款A3.根据条款A1所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款A4.根据条款A1至条款A3任一项所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,包括:
在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
条款A5.根据条款A4所述的方法,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔,包括:
根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
条款A6.根据条款A4所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,还包括:
当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预 设迭代。
条款A7.根据条款A4所述的方法,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
条款A8.根据条款A4所述的方法,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
根据获取的点位置迭代间隔和所述参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
条款A9.根据条款A8所述的方法,所述点位置迭代间隔小于或等于所述参考迭代间隔。
条款A10.根据条款A7至条款A9任一项所述的方法,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
条款A11.根据条款A7至条款A9任一项所述的方法,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
条款A12.根据条款A7至条款A9任一项所述的方法,所述方法还包括:
根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
条款A13.根据条款A12所述的方法,所述根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽,包括:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
条款A14.根据条款A13所述的方法,若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽,包括:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款A15.根据条款A13所述的方法,所述若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,包括:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款A16.根据条款A1至条款A15任一项所述的方法,所述获取待量化数据的数据变动幅度,包括:
获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动 幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
条款A17.根据条款A16所述的方法,所述获取点位置的变动幅度,包括:
根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述参考迭代间隔之前的上一迭代间隔对应的检验迭代;
根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
条款A18.根据条款A17所述的方法,根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
条款A19.根据条款A17所述的方法,所述根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
条款A20.根据条款A17所述的方法,所述方法还包括:
根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
条款A21.根据条款A20所述的方法,所述根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值,包括:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
条款A22.根据条款A17所述的方法,所述根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
根据所述第一误差确定所述第一目标迭代间隔,所述第一目标迭代间隔与所述第一误差负相关。
条款A23.根据条款A16至条款A22任一项所述的方法,所述获取待量化数据的数据变动幅度,还包括:
获取数据位宽的变化趋势;
根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
条款A24.根据条款A23所述的方法,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,还包括:
根据获取的第一误差和第二误差,确定所述第一目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
条款A25.根据条款A23所述的方法,根据获取的第一误差和第二误差,确定所述第一目标迭代间隔,包括:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述第一目标迭代间隔,其中,所述目标误差与所述第一目标迭代间隔负相关。
条款A26.根据条款A24或条款A25所述的方法,所述第二误差根据量化误差确定;
其中,所述量化误差根据当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所 述第二误差与所述量化误差正相关。
条款A27.根据条款A4所述的方法,所述方法还包括:
当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款A28.根据条款A1-条款A27任一项所述的方法,所述待量化数据为神经元数据、权值数据或梯度数据中的至少一种。
条款A29.一种循环神经网络的量化参数调整装置,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现如条款A1-28任一项所述的方法的步骤。
条款A30.一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时,实现如条款A1-条款A28任一项所述的方法的步骤。
条款A31.一种循环神经网络的量化参数调整装置,所述装置包括:
获取模块,用于获取待量化数据的数据变动幅度;
迭代间隔确定模块,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
条款A32.根据条款A31所述的装置,所述装置还包括:
预设间隔确定模块,用于在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
条款A33.根据条款A31所述的装置,
所述迭代间隔确定模块,还用于在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
条款A34.根据条款A31至条款A33任一项所述的装置,所述迭代间隔确定模块,包括:
第二目标迭代间隔确定子模块,在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
更新迭代确定子模块,根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
条款A35.根据条款A34所述的装置,所述第二目标迭代间隔确定子模块,包括:
更新周期确定子模块,根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
确定子模块,根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
条款A36.根据条款A34所述的装置,
所述迭代间隔确定模块,还用于当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
条款A37.根据条款A34所述的装置,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
量化参数确定模块,用于根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
条款A38.根据条款A34所述的装置,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
数据位宽确定模块,用于根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
量化参数确定模块,用于根据获取的点位置迭代间隔和参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述神经网络运算中的点位置;
其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
条款A39.根据条款A38所述的装置,所述点位置迭代间隔小于或等于所述参考迭代间隔。
条款A40.根据条款A37至条款A39任一项所述的装置,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
条款A41.根据条款A37至条款A39任一项所述的装置,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
条款A42.根据条款A37至条款A39任一项所述的装置,所述数据位宽确定模块包括:
量化误差确定子模块,用于根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
数据位宽确定子模块,用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
条款A43.根据条款A42所述的装置,所述数据位宽确定单元用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
条款A44.根据条款A43所述的装置,所述数据位宽确定单元用于若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款A45.根据条款A43所述的装置,所述数据位宽确定单元用于若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
条款A46.根据条款A31至条款A45任一项所述的装置,所述获取模块包括:
第一获取模块,用于获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
条款A47.根据条款A46所述的装置,所述第一获取模块包括:
第一均值确定单元,用于根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述目标迭代间隔之前的上一迭代间隔对应的检验迭代;
第二均值确定单元,用于根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史 迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
第一误差确定单元,用于根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
条款A48.根据条款A47所述的装置,所述第二均值确定单元具体用于:
获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
条款A49.根据条款A47所述的装置,所述第二均值确定单元具体用于根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
条款A50.根据条款A47所述的装置,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;
其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
条款A51.根据条款A50所述的装置,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值时,具体用于:
当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
条款A52.根据条款A47所述的装置,所述迭代间隔确定模块用于根据所述第一误差确定所述目标迭代间隔,所述目标迭代间隔与所述第一误差负相关。
条款A53.根据条款A46至条款A52任一项所述的装置,所述获取模块还包括:
第二获取模块,用于获取数据位宽的变化趋势;根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
条款A54.根据条款A53所述的装置,所述迭代间隔确定模块还用于根据获取的第一误差和第二误差,确定所述目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
条款A55.根据条款A53所述的装置,所述迭代间隔确定模块用于根据获取的第一误差和第二误差,确定所述目标迭代间隔时,具体用于:
将所述第一误差和所述第二误差中最大值作为目标误差;
根据所述目标误差确定所述目标迭代间隔,其中,所述目标误差与所述目标迭代间隔负相关。
条款A56.根据条款A54或55所述的装置,所述第二误差根据量化误差确定;
其中,所述量化误差根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
条款A57.根据条款A34所述的装置,
所述迭代间隔确定模块,还用于当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文公开的各实施例。

Claims (57)

  1. 一种循环神经网络的量化参数调整方法,其特征在于,所述方法包括:
    获取待量化数据的数据变动幅度;
    根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述第一目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
  3. 根据权利要求1所述的方法,其特征在于,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
    在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,包括:
    在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
    根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
    其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
  5. 根据权利要求4所述的方法,其特征在于,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔,包括:
    根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
    根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
  6. 根据权利要求4所述的方法,其特征在于,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,还包括:
    当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
  7. 根据权利要求4所述的方法,其特征在于,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
    根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
    其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
  8. 根据权利要求4所述的方法,其特征在于,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述方法还包括:
    根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
    根据获取的点位置迭代间隔和所述参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
    其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
  9. 根据权利要求8所述的方法,其特征在于,所述点位置迭代间隔小于或等于所述参考迭代间隔。
  10. 根据权利要求7至9任一项所述的方法,其特征在于,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
  11. 根据权利要求7至9任一项所述的方法,其特征在于,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
  12. 根据权利要求7至9任一项所述的方法,其特征在于,所述方法还包括:
    根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据,确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
    根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽,包括:
    若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
    若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
  14. 根据权利要求13所述的方法,其特征在于,若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽,包括:
    若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
    返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
  15. 根据权利要求13所述的方法,其特征在于,所述若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,包括:
    若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
    返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
  16. 根据权利要求1至15任一项所述的方法,其特征在于,所述获取待量化数据的数据变动幅度,包括:
    获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
  17. 根据权利要求16所述的方法,其特征在于,所述获取点位置的变动幅度,包括:
    根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述参考迭代间隔之前的上一迭代间隔对应的检验迭代;
    根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
    根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
  18. 根据权利要求17所述的方法,其特征在于,根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
    获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
    根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
  19. 根据权利要求17所述的方法,其特征在于,所述根据所述当前检验迭代对应的点位置及所述 当前检验迭代之前的历史迭代的点位置,确定第二均值,包括:
    根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
  20. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
  21. 根据权利要求20所述的方法,其特征在于,所述根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值,包括:
    当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
    当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
  22. 根据权利要求17所述的方法,其特征在于,所述根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,包括:
    根据所述第一误差确定所述第一目标迭代间隔,所述第一目标迭代间隔与所述第一误差负相关。
  23. 根据权利要求16至22任一项所述的方法,其特征在于,所述获取待量化数据的数据变动幅度,还包括:
    获取数据位宽的变化趋势;
    根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
  24. 根据权利要求23所述的方法,其特征在于,根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,还包括:
    根据获取的第一误差和第二误差,确定所述第一目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
  25. 根据权利要求23所述的方法,其特征在于,根据获取的第一误差和第二误差,确定所述第一目标迭代间隔,包括:
    将所述第一误差和所述第二误差中最大值作为目标误差;
    根据所述目标误差确定所述第一目标迭代间隔,其中,所述目标误差与所述第一目标迭代间隔负相关。
  26. 根据权利要求24或25所述的方法,其特征在于,所述第二误差根据量化误差确定;
    其中,所述量化误差根据当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
  27. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
  28. 根据权利要求1-27任一项所述的方法,其特征在于,所述待量化数据为神经元数据、权值数据或梯度数据中的至少一种。
  29. 一种循环神经网络的量化参数调整装置,其特征在于,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1-28任一项所述的方法的步骤。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时,实现如权利要求1-28任一项所述的方法的步骤。
  31. 一种循环神经网络的量化参数调整装置,其特征在于,所述装置包括:
    获取模块,用于获取待量化数据的数据变动幅度;
    迭代间隔确定模块,用于根据所述待量化数据的数据变动幅度,确定第一目标迭代间隔,以根据所述第一目标迭代间隔调整所述循环神经网络运算中的量化参数,其中,所述目标迭代间隔包括至少一次迭代,所述循环神经网络的量化参数用于实现对所述循环神经网络运算中待量化数据的量化操作。
  32. 根据权利要求31所述的装置,其特征在于,所述装置还包括:
    预设间隔确定模块,用于在当前检验迭代小于或等于第一预设迭代时,根据预设迭代间隔调整所述量化参数。
  33. 根据权利要求31所述的装置,其特征在于,
    所述迭代间隔确定模块,还用于在当前检验迭代大于第一预设迭代时,根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
  34. 根据权利要求31至33任一项所述的装置,其特征在于,所述迭代间隔确定模块,包括:
    第二目标迭代间隔确定子模块,在当前检验迭代大于或等于第二预设迭代、且当前检验迭代需要进行量化参数调整时,根据所述第一目标迭代间隔和各周期中迭代的总数确定与所述当前检验迭代对应的第二目标迭代间隔;
    更新迭代确定子模块,根据所述第二目标迭代间隔确定出与所述当前检验迭代相对应的更新迭代,以在所述更新迭代中调整所述量化参数,所述更新迭代为所述当前检验迭代之后的迭代;
    其中,所述第二预设迭代大于第一预设迭代,所述循环神经网络的量化调整过程包括多个周期,所述多个周期中迭代的总数不一致。
  35. 根据权利要求34所述的装置,其特征在于,所述第二目标迭代间隔确定子模块,包括:
    更新周期确定子模块,根据当前检验迭代在当前周期中的迭代排序数和所述当前周期之后的周期中迭代的总数,确定出对应所述当前检验迭代的更新周期,所述更新周期中迭代的总数大于或等于所述迭代排序数;
    确定子模块,根据所述第一目标迭代间隔、所述迭代排序数、所述当前周期与更新周期之间的周期中迭代的总数,确定出所述第二目标迭代间隔。
  36. 根据权利要求34所述的装置,其特征在于,
    所述迭代间隔确定模块,还用于当所述循环神经网络的收敛程度满足预设条件时,则确定所述当前检验迭代大于或等于第二预设迭代。
  37. 根据权利要求34所述的装置,其特征在于,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
    量化参数确定模块,用于根据当前检验迭代对应的目标数据位宽和所述当前检验迭代的待量化数据,确定参考迭代间隔中迭代对应的点位置,以调整所述循环神经网络运算中的点位置;
    其中,所述参考迭代间隔中迭代对应的点位置一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔。
  38. 根据权利要求34所述的装置,其特征在于,所述量化参数包括点位置,所述点位置为所述待量化数据对应的量化数据中小数点的位置;所述装置还包括:
    数据位宽确定模块,用于根据所述当前检验迭代对应的目标数据位宽,确定参考迭代间隔对应的数据位宽,其中,所述参考迭代间隔中迭代对应的数据位宽一致,所述参考迭代间隔包括所述第二目标迭代间隔或所述预设迭代间隔;
    量化参数确定模块,用于根据获取的点位置迭代间隔和参考迭代间隔对应的数据位宽,调整所述参考迭代间隔中迭代对应的点位置,以调整所述神经网络运算中的点位置;
    其中,所述点位置迭代间隔包含至少一次迭代,所述点位置迭代间隔中迭代的点位置一致。
  39. 根据权利要求38所述的装置,其特征在于,所述点位置迭代间隔小于或等于所述参考迭代间隔。
  40. 根据权利要求37至39任一项所述的装置,其特征在于,所述量化参数还包括缩放系数,所述缩放系数与所述点位置同步更新。
  41. 根据权利要求37至39任一项所述的装置,其特征在于,所述量化参数还包括偏移量,所述偏移量与所述点位置同步更新。
  42. 根据权利要求37至39任一项所述的装置,其特征在于,所述数据位宽确定模块包括:
    量化误差确定子模块,用于根据所述当前检验迭代的待量化数据和所述当前检验迭代的量化数据, 确定量化误差,其中,所述当前检验迭代的量化数据对所述当前检验迭代的待量化数据进行量化获得;
    数据位宽确定子模块,用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽。
  43. 根据权利要求42所述的装置,其特征在于,所述数据位宽确定单元用于根据所述量化误差,确定所述当前检验迭代对应的目标数据位宽时,具体用于:
    若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽;或者,
    若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽。
  44. 根据权利要求43所述的装置,其特征在于,所述数据位宽确定单元用于若所述量化误差大于或等于第一预设阈值,则增加所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
    若所述量化误差大于或等于第一预设阈值,则根据第一预设位宽步长确定第一中间数据位宽;
    返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差小于所述第一预设阈值;其中,所述当前检验迭代的量化数据是根据所述第一中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
  45. 根据权利要求43所述的装置,其特征在于,所述数据位宽确定单元用于若所述量化误差小于或等于第二预设阈值,则减小所述当前检验迭代对应的数据位宽,获得所述当前检验迭代对应的目标数据位宽时,具体用于:
    若所述量化误差小于或等于第二预设阈值,则根据第二预设位宽步长确定第二中间数据位宽;
    返回执行根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据,确定量化误差,直至所述量化误差大于所述第二预设阈值;其中,所述当前检验迭代的量化数据是根据所述第二中间数据位宽对所述当前检验迭代的待量化数据进行量化获得。
  46. 根据权利要求31至45任一项所述的装置,其特征在于,所述获取模块包括:
    第一获取模块,用于获取点位置的变动幅度;其中,所述点位置的变动幅度能够用于表征所述待量化数据的数据变动幅度,所述点位置的变动幅度与所述待量化数据的数据变动幅度正相关。
  47. 根据权利要求46所述的装置,其特征在于,所述第一获取模块包括:
    第一均值确定单元,用于根据当前检验迭代之前的上一检验迭代对应的点位置,以及所述上一检验迭代之前的历史迭代对应的点位置,确定第一均值,其中,所述上一检验迭代为所述目标迭代间隔之前的上一迭代间隔对应的检验迭代;
    第二均值确定单元,用于根据所述当前检验迭代对应的点位置及所述当前检验迭代之前的历史迭代的点位置,确定第二均值;其中,所述当前检验迭代对应的点位置根据所述当前检验迭代对应的目标数据位宽和待量化数据确定;
    第一误差确定单元,用于根据所述第一均值和所述第二均值确定第一误差,所述第一误差用于表征所述点位置的变动幅度。
  48. 根据权利要求47所述的装置,其特征在于,所述第二均值确定单元具体用于:
    获取预设数量的中间滑动平均值,其中,各个所述中间滑动平均值是根据所述当前检验迭代之前所述预设数量的检验迭代确定的;
    根据所述当前检验迭代的点位置以及所述预设数量的中间滑动平均值,确定所述第二均值。
  49. 根据权利要求47所述的装置,其特征在于,所述第二均值确定单元具体用于根据所述当前检验迭代对应的点位置以及所述第一均值,确定所述第二均值。
  50. 根据权利要求47所述的装置,其特征在于,所述第二均值确定单元用于根据获取的所述当前检验迭代的数据位宽调整值,更新所述第二均值;
    其中,所述当前检验迭代的数据位宽调整值根据所述当前检验迭代的目标数据位宽和初始数据位宽确定。
  51. 根据权利要求50所述的装置,其特征在于,所述第二均值确定单元用于根据获取的所述当前 检验迭代的数据位宽调整值,更新所述第二均值时,具体用于:
    当所述当前检验迭代的数据位宽调整值大于预设参数时,则根据所述当前检验迭代的数据位宽调整值减小所述第二均值;
    当所述当前检验迭代的数据位宽调整值小于预设参数时,则根据所述当前检验迭代的数据位宽调整值增大所述第二均值。
  52. 根据权利要求47所述的装置,其特征在于,所述迭代间隔确定模块用于根据所述第一误差确定所述目标迭代间隔,所述目标迭代间隔与所述第一误差负相关。
  53. 根据权利要求46至52任一项所述的装置,其特征在于,所述获取模块还包括:
    第二获取模块,用于获取数据位宽的变化趋势;根据所述点位置的变动幅度和所述数据位宽的变化趋势,确定所述待量化数据的数据变动幅度。
  54. 根据权利要求53所述的装置,其特征在于,所述迭代间隔确定模块还用于根据获取的第一误差和第二误差,确定所述目标迭代间隔;其中,所述第一误差用于表征点位置的变动幅度,所述第二误差用于表征数据位宽的变化趋势。
  55. 根据权利要求53所述的装置,其特征在于,所述迭代间隔确定模块用于根据获取的第一误差和第二误差,确定所述目标迭代间隔时,具体用于:
    将所述第一误差和所述第二误差中最大值作为目标误差;
    根据所述目标误差确定所述目标迭代间隔,其中,所述目标误差与所述目标迭代间隔负相关。
  56. 根据权利要求54或55所述的装置,其特征在于,所述第二误差根据量化误差确定;
    其中,所述量化误差根据所述当前检验迭代中待量化数据和所述当前检验迭代的量化数据确定,所述第二误差与所述量化误差正相关。
  57. 根据权利要求34所述的装置,其特征在于,
    所述迭代间隔确定模块,还用于当所述当前检验迭代大于或等于第二预设迭代,且第二误差大于预设误差值时,则根据所述待量化数据的数据变动幅度确定第一目标迭代间隔。
PCT/CN2020/110142 2019-08-27 2020-08-20 循环神经网络的量化参数调整方法、装置及相关产品 WO2021036892A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/622,647 US20220366238A1 (en) 2019-08-27 2020-08-20 Method and apparatus for adjusting quantization parameter of recurrent neural network, and related product

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910798228 2019-08-27
CN201910798228.2 2019-08-27
CN201910888141.4 2019-09-19
CN201910888141.4A CN112085150A (zh) 2019-06-12 2019-09-19 量化参数调整方法、装置及相关产品

Publications (1)

Publication Number Publication Date
WO2021036892A1 true WO2021036892A1 (zh) 2021-03-04

Family

ID=74683480

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110142 WO2021036892A1 (zh) 2019-08-27 2020-08-20 循环神经网络的量化参数调整方法、装置及相关产品

Country Status (2)

Country Link
US (1) US20220366238A1 (zh)
WO (1) WO2021036892A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210156538A (ko) * 2020-06-18 2021-12-27 삼성전자주식회사 뉴럴 네트워크를 이용한 데이터 처리 방법 및 데이터 처리 장치

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063820A (zh) * 2018-06-07 2018-12-21 中国科学技术大学 利用时频联合长时循环神经网络的数据处理方法
US20190122119A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
CN109800877A (zh) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 神经网络的参数调整方法、装置及设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122119A1 (en) * 2017-10-25 2019-04-25 SparkCognition, Inc. Adjusting automated neural network generation based on evaluation of candidate neural networks
CN109063820A (zh) * 2018-06-07 2018-12-21 中国科学技术大学 利用时频联合长时循环神经网络的数据处理方法
CN109800877A (zh) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 神经网络的参数调整方法、装置及设备

Also Published As

Publication number Publication date
US20220366238A1 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
CN112085184B (zh) 量化参数调整方法、装置及相关产品
WO2021036908A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036905A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021036890A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021022903A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2022111002A1 (zh) 用于训练神经网络的方法、设备和计算机可读存储介质
WO2021036362A1 (zh) 用于处理数据的方法、装置以及相关产品
WO2021036892A1 (zh) 循环神经网络的量化参数调整方法、装置及相关产品
CN112085182A (zh) 数据处理方法、装置、计算机设备和存储介质
CN112085176B (zh) 数据处理方法、装置、计算机设备和存储介质
CN112085187A (zh) 数据处理方法、装置、计算机设备和存储介质
CN112085150A (zh) 量化参数调整方法、装置及相关产品
JP7146952B2 (ja) データ処理方法、装置、コンピュータデバイス、及び記憶媒体
WO2021169914A1 (zh) 数据量化处理方法、装置、电子设备和存储介质
US20220222041A1 (en) Method and apparatus for processing data, and related product
WO2021037083A1 (zh) 用于处理数据的方法、装置以及相关产品
CN112085151A (zh) 数据处理方法、装置、计算机设备和存储介质
CN112085177A (zh) 数据处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20859559

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20859559

Country of ref document: EP

Kind code of ref document: A1