CN113780549A

CN113780549A - Quantitative model training method, device, medium and terminal equipment for overflow perception

Info

Publication number: CN113780549A
Application number: CN202110898412.1A
Authority: CN
Inventors: 谭明奎; 黎浩坤; 李焕童; 王耀威
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-12-10
Also published as: WO2023011002A1

Abstract

The application discloses an overflow sensing quantitative model training method, an overflow sensing quantitative model training device, an overflow sensing quantitative model training medium and a terminal device. In the embodiment, the quantization scaling factor and the quantization scaling factor are adjusted based on the constraint digit of the accumulator for operating the network layer, so that the output data of the network layer does not overflow, the model performance can be improved, the network layer can be operated in the accumulator with low constraint digit, the parallel number of the accumulators for operating the network layer can be increased, and the operation speed of the network layer can be improved.

Description

Quantitative model training method, device, medium and terminal equipment for overflow perception

Technical Field

The present application relates to the field of quantization training technologies, and in particular, to a method, an apparatus, a medium, and a terminal device for overflow sensing quantization model training.

Background

With the development of the deep neural network, the number and complexity of parameters of the network are increased sharply, and the deep neural network is more and more difficult to be deployed on equipment with limited computation, storage resources and power consumption, such as mobile phones, unmanned aerial vehicles, robots and the like. Quantization is the most common method for deploying complex models to the above devices, wherein quantization can map parameters from high-bit floating point numbers to low-bit fixed point numbers, and then approximate the original floating point number calculation by fixed point operation, so as to realize calculation acceleration and save storage. However, the existing quantization model training method can quantize the model parameters to lower bits, and can use integer operation and shift operation with higher computational performance to reason, and basically maintain the model performance. However, the existing quantization model training method only focuses on the quantization bit number of the parameter, but ignores the bit number constraint of an accumulator used for storing intermediate values such as convolution results and the like, often needs to use an accumulator with a larger bit width, and is difficult to fully exert the performance of hardware. The accumulator with a smaller bit width is directly used, so that the problem of intermediate result overflow frequently occurs in convolution operation, and the performance of the model is further influenced.

Thus, the prior art has yet to be improved and enhanced.

Disclosure of Invention

The technical problem to be solved by the application is to provide a quantitative model training method, device, medium and terminal device for overflow sensing aiming at the defects of the prior art.

In order to solve the above technical problem, a first aspect of the embodiments of the present application provides a method for training a overflow-aware quantization model, where the method includes:

the method comprises the steps of carrying out quantization processing on target data of a network layer in a preset network model, and determining output data of the network layer based on the quantized target data and the network layer, wherein the target data comprises at least one of input data and network parameters of the network layer;

determining the number of overflow elements corresponding to the output data, and adjusting the quantization scaling factor corresponding to the network layer based on the number of overflow elements to adjust the quantization scaling factor corresponding to the network layer, wherein the number of overflow elements is a number smaller than a minimum value which can be represented by a preset number of bits or larger than a maximum value which can be represented by a preset number of bits in the output data.

The overflow sensing quantization model training method is characterized in that the preset bit number is a constraint bit number used for operating an accumulator of the network layer.

The overflow-aware quantitative model training method further comprises the following steps:

and inputting the output data into the next network layer after passing through the activation layer.

The overflow sensing quantitative model training method includes the following specific steps:

obtaining a quantization range corresponding to target data, and determining a quantization scaling factor of the target data based on the quantization range;

and performing quantization operation and inverse quantization operation on the target data based on the quantization scaling factor to obtain quantized target data.

The overflow-aware quantization model training method includes that the network layer is configured with a quantization scaling coefficient, the obtaining a quantization range corresponding to target data, and determining a quantization scaling factor of the target data based on the quantization range specifically includes:

acquiring an upper limit value and a lower limit value of a quantization range corresponding to the target data, and determining candidate quantization scaling factors based on the upper limit value, the lower limit value and a preset bit;

and acquiring a product of the candidate quantization scaling factor and the quantization scaling coefficient, and taking the product as the quantization scaling factor of the target data.

The overflow-aware quantization model training method is characterized in that the calculation formula of the candidate quantization scaling factor is as follows:

where rmax is an upper limit value of the target data, rmin is a lower limit value of the target data, S1 denotes a candidate quantization scaling factor, and b denotes a predetermined bit.

The overflow perception quantization model training method comprises the following steps that:

wherein l_cWhich represents the maximum value of the learning rate,

the real-time learning rate is shown, bs represents the number of batch training samples, and gamma is an updating speed adjusting factor.

The overflow perception quantitative model training method is characterized in that the network layer is a convolutional layer which is fused with a batch normalization layer.

A second aspect of the embodiments of the present application provides an overflow-aware quantitative model training apparatus, where the apparatus includes:

the system comprises a quantization module, a data processing module and a data processing module, wherein the quantization module is used for performing quantization processing on target data of a network layer in a preset network model and determining output data of the network layer based on the quantized target data and the network layer, and the target data comprises at least one of input data and network parameters of the network layer;

and the sensing module is used for determining the quantity of overflow elements corresponding to the output data and adjusting the quantization scaling coefficient corresponding to the network layer based on the quantity of the overflow elements so as to adjust the quantization scaling factor corresponding to the network layer, wherein the quantity of the overflow elements is the quantity which is smaller than the minimum value which can be represented by a preset bit number or larger than the maximum value which can be represented by the preset bit number in the output data.

A third aspect of embodiments of the present application provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the spillover-aware quantization model training method as described in any of the above.

A fourth aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the spillover aware quantitative model training method as described in any one of the above.

Has the advantages that: compared with the prior art, the method comprises the steps of carrying out quantization processing on target data of a network layer in a preset network model, determining output data of the network layer based on the target data subjected to quantization processing and the network layer, determining the quantity of overflow elements corresponding to the output data, and adjusting quantization scaling coefficients corresponding to the network layer based on the quantity of the overflow elements so as to adjust quantization scaling factors corresponding to the network layer. In the embodiment, the quantization scaling coefficient is adjusted based on the constraint digit of the accumulator for operating the network layer, so that the output data of the network layer cannot overflow, the model performance can be improved, meanwhile, the quantization scaling coefficient is adjusted based on the constraint digit of the accumulator, the network layer can be operated in the accumulator with low constraint digit, the parallel number of the accumulators for operating the network layer can be increased, and the operation speed of the network layer can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.

Fig. 1 is a flowchart of a quantitative model training method for overflow sensing provided in the present application.

Fig. 2 is a flowchart illustrating a method for training a quantitative overflow sensing model according to the present application.

Fig. 3 is a schematic diagram of a quantization operation and an inverse quantization operation in the overflow-aware quantization model training method provided in the present application.

Fig. 4 is a schematic structural diagram of a quantitative model training apparatus for overflow sensing provided in the present application.

Fig. 5 is a schematic structural diagram of a terminal device provided in the present application.

Detailed Description

The present application provides a method, an apparatus, a medium, and a terminal device for overflow sensing quantization model training, which are described in further detail below with reference to the accompanying drawings and examples in order to make the purpose, technical solution, and effect of the present application clearer and clearer. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It should be understood that, the sequence numbers and sizes of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process is determined by its function and inherent logic, and should not constitute any limitation on the implementation process of this embodiment.

The inventor finds that with the development of a deep neural network, the number and complexity of parameters of the network are increased sharply, and the deep neural network is more and more difficult to deploy on equipment with limited computing, storage resources and power consumption, such as a mobile phone, an unmanned aerial vehicle, a robot and the like. Quantization is the most common method for deploying complex models to the above devices, wherein quantization can map parameters from high-bit floating point numbers to low-bit fixed point numbers, and then approximate the original floating point number calculation by fixed point operation, so as to realize calculation acceleration and save storage. However, the existing quantization model training method can quantize the model parameters to lower bits, and can use integer operation and shift operation with higher computational performance to reason, and basically maintain the model performance. However, the existing quantization model training method only focuses on the quantization bit number of the parameter, but ignores the bit number constraint of an accumulator used for storing intermediate values such as convolution results, and when the accumulator uses lower bit numbers, the convolution operation often causes the problem of overflow of the intermediate results, thereby affecting the model performance. For example, if the constraint number of the accumulator on the low power device is 16 bits, the overflow problem may occur in 2 times of 8-bit multiply-add operation.

In order to solve the above problem, in the embodiment of the present application, target data of a network layer in a preset network model is quantized, output data of the network layer is determined based on the quantized target data and the network layer, the number of overflow elements corresponding to the output data is determined, and a quantization scaling factor corresponding to the network layer is adjusted based on the number of overflow elements, so as to adjust a quantization scaling factor corresponding to the network layer. In the embodiment, the quantization scaling coefficient is adjusted based on the constraint digit of the accumulator for operating the network layer, so that the output data of the network layer cannot overflow, the model performance can be improved, meanwhile, the quantization scaling coefficient is adjusted based on the constraint digit of the accumulator, the network layer can be operated in the accumulator with low constraint digit, the parallel number of the accumulators for operating the network layer can be increased, and the operation speed of the network layer can be improved.

The following further describes the content of the application by describing the embodiments with reference to the attached drawings.

The quantization training aims to simulate the quantization process during network training, so that the network can adaptively learn the parameter value which minimizes the quantization loss. At present, existing Quantization training is generally divided into Post-training Quantization (Post-training Quantization) and training-aware Quantization (Quantization-aware Quantization), wherein Post-training Quantization (Post-training Quantization) refers to that parameters of a neural network model are directly quantized after the neural network model of floating point parameters is trained, and the method has good Quantization effect on the model with large parameter quantity and small performance loss, but the model with small parameter quantity can cause great performance reduction. The training perception Quantization (Quantization-aware Quantization) is to simulate Quantization behavior in training, store fixed-point parameters by using floating-point parameters in training, and directly adopt the fixed-point parameters to operate when the model is inferred finally.

The overflow sensing quantization model training method provided by this embodiment is to add an overflow sensing processing process on the basis of the existing training sensing quantization, adjust the quantization scaling coefficient in the quantization training through the sensing processing process to adjust the quantization scaling factor in the quantization training, so as to improve the model performance, and adjust the quantization scaling coefficient based on the constraint digits of the accumulator, so that the network layer can be performed in the accumulator with low constraint digits, thereby increasing the parallel number of the accumulators used for operating the network layer, and improving the operation speed of the network layer.

As shown in fig. 1 and fig. 2, the method for training a quantitative model of overflow sensing provided in this embodiment specifically includes:

s10, quantizing the target data of the network layer in the preset network model, and determining the output data of the network layer based on the quantized target data and the network layer.

Specifically, the preset network model may be a preset or pre-trained neural network model, and the preset network model at least includes a network layer. For example, the predetermined network model may be a convolutional neural network model, and the convolutional neural network model may include a convolutional layer, an active layer, a pooling layer, a fully-connected layer, and the like, for example, the convolutional neural network model VGG16 may sequentially include a convolutional layer 1, a convolutional layer 2, a pooling layer 1, a convolutional layer 3, a convolutional layer 4, a pooling layer 2, a convolutional layer 5, a convolutional layer 6, a convolutional layer 7, a pooling layer 3, a convolutional layer 8, a convolutional layer 9, a convolutional layer 10, a pooling layer 4, a convolutional layer 11, a convolutional layer 12, a convolutional layer 13, a fully-connected layer 1, a fully-connected layer 2, and a fully-connected layer 3. In addition, the network layer is one of the preset network models, for example, the preset network model is a convolutional neural network model VGG16, and the network layer may be convolutional layer 1, convolutional layer 2, convolutional layer 10, or the like. In one implementation manner of this embodiment, the network layer is a convolutional layer that merges batch normalization layers. It will be appreciated that batch normalization lamination is incorporated into the convolutional layer to save computation.

The target data includes at least one of input data and network parameters of the network layer, that is, the target data may be input data of the network layer, or may be network parameters of the network layer, or may include input data and network parameters of the network layer. In addition, the data type of the target data is a floating point type, and the data type of the target data can be converted into an integer type after quantization processing is performed on the target data. For example, assuming that the target parameter is a 32-scale floating point number, the type of the quantized data is 8-bit fixed point data, and the quantization scaling factor is S, the target parameter can be quantized into [ -127,127] based on a quantization formula, e.g., the value of the 3 rd black point on the 1 st row in fig. 3 is divided by S, which results in the value between intervals (-126, -125), and the value is rounded to obtain-126, and multiplied by S to obtain the dequantized value, i.e., the value of the 2 nd black point on the 1 st row. Wherein, the quantization formula and the inverse quantization formula can be respectively:

r_dequant＝q*S

wherein q represents a post-quantization value, r represents a pre-quantization value, round (·) represents a rounding operation, r represents a rounding operation_dequantRepresenting the inverse quantized value, S is the quantization scaling factor.

In an implementation manner of this embodiment, the performing quantization processing on target data of a network layer in a preset network model specifically includes:

Specifically, the quantization scaling factor is used to quantize the target data from floating point data with high bits to quantization coefficients of fixed point data with low bits, and after the quantization scaling factor is obtained, the quantization scaling factor may be used to quantize the target data based on the quantization scaling factor, where the quantization formula may be:

wherein r represents a pre-quantization value, q represents a post-quantization value, S represents a quantization scaling factor, and round (·) represents a rounding operation.

In an implementation manner of this embodiment, a quantization scaling coefficient corresponding to the network layer is preset, and the quantization scaling coefficient is used for adjusting a mapping relationship between an integer range and a quantization range. It is to be understood that the quantization scaling factor is used to adjust the quantization scaling factor to adjust the mapping relationship between the integer range and the quantization range. That is, the scaling factor determined according to the quantization range may be determined first according to the quantization range, and then the scaling factor determined according to the quantization range may be adjusted by quantizing the scaling factor. Based on this, the obtaining a quantization range corresponding to the target data, and determining the quantization scaling factor of the target data based on the quantization range specifically include:

Specifically, the target data may be input data and/or network parameters, so that when determining a quantization scaling factor corresponding to the target data, data included in the target data needs to be determined, and if the target data only includes the input data, an upper limit value and a lower limit value of a quantization range of the input data and predetermined bits are directly obtained, and then candidate quantization scaling factors are determined; if the target data only comprises network data, directly acquiring an upper limit value and a lower limit value of a quantization range of the network data and a predetermined bit, and then determining a candidate quantization scaling factor; if the target data includes both the input data and the network data, upper and lower limit values and predetermined bits of quantization ranges of the input data and the network data are acquired, respectively, and then candidate quantization scaling factors are determined.

In an implementation manner of this embodiment, the calculation formula of the candidate quantization scaling factor and the calculation formula of the quantization scaling factor may be:

where rmax is an upper limit value of the target data, rmin is a lower limit value of the target data, S1 denotes a candidate quantization scaling factor, b denotes a predetermined bit, S denotes a quantization scaling factor, α denotes a quantization scaling coefficient, abs (·) denotes an absolute value calculation, and max (·) denotes a maximum value calculation.

In one implementation manner of this embodiment, the target data includes input data and a network parameter, the network parameter may include a weight parameter and a bias parameter, and a quantization scaling factor corresponding to the bias parameter may be determined based on the quantization scaling factor corresponding to the input data and the quantization scaling factor corresponding to the weight parameter. That is, the quantization scaling factor of the input data is determined first according to the upper limit value, the lower limit value, the predetermined bit and the quantization scaling factor of the input data, the quantization scaling factor of the weight parameter is determined second according to the upper limit value, the lower limit value, the predetermined bit and the quantization scaling factor of the weight parameter, and the quantization scaling factor of the offset parameter is determined last according to the quantization scaling factor of the input data and the quantization scaling factor of the weight parameter. For example, the quantization scaling factor of the offset parameter may be equal to the product of the quantization scaling factor of the input data and the scaling factor of the weight parameter, or obtained by weighting the quantization scaling factor of the input data and the scaling factor of the weight parameter, and so on.

In an implementation manner of this embodiment, when the quantization scaling factor is obtained, the target data is quantized and dequantized by the quantization scaling factor to obtain an dequantization value approximating the target data, where a calculation formula of the dequantization value may be:

wherein r is_dequantRepresenting the inverse quantization value, r representing the pre-quantization value, S representing the quantization scaling factor, and round (·) representing the rounding operation.

In one implementation manner of this embodiment, after performing quantization operation and inverse quantization operation on target data, network parameters in the target data are configured in a network layer, input data in the target data are input into the network layer, and the input data are processed by the network layer to obtain output data. Certainly, when the target data only comprises the network parameters, configuring the network parameters after quantization operation and inverse quantization operation into a network layer, and inputting the input data which is not subjected to quantization operation and inverse quantization operation into the network layer; when the target data only comprises the input data, the network parameters of the network layer are kept unchanged, and the input data after the quantization operation and the inverse quantization operation are directly input into the network layer.

In an implementation manner of this embodiment, the method further includes inputting the output data into a next network layer after passing through the active layer, where the output data of the active layer may be quantized, subjected to analog integer inference, and subjected to inverse quantization before being input into the active layer to convert the output data into floating-point data, and then the output data converted into the floating-point data is input into the active layer, and an output item of the active layer is input into the next network layer as the output data.

And S20, determining the number of overflow elements corresponding to the output data, and adjusting the quantization scaling factor corresponding to the network layer based on the number of overflow elements to adjust the quantization scaling factor corresponding to the network layer.

Specifically, the number of the overflow elements is a number smaller than a minimum value that can be represented by a preset number of bits or larger than a maximum value that can be represented by a preset number of bits in the output data, where the preset number of bits is a constraint number of bits used for operating an accumulator of the network layer. For example, when a 128-bit register is divided into 8 accumulators, the constraint bit number of each accumulator is 16 bits; when the 128-bit register is divided into 4 accumulators, the constraint bit number of each accumulator is 32 bits, wherein the convolution operation of 8 element positions can be operated in parallel when the 128-bit register is divided into 8 accumulators, and the convolution elements of 4 element positions can be operated in parallel when the 128-bit register is divided into 4 accumulators, that is, when the constraint bit number is 16 bits, the calculation efficiency of the register can be doubled compared with that when the constraint bit number is 32 bits. Therefore, the quantization scaling factor is adjusted according to the number of overflow elements in the embodiment, so that the quantization scaling factor is adjusted, the overflow phenomenon is avoided when the operation of the accumulator is ensured, and the calculation efficiency of the network layer can be improved by adjusting the number of constraint bits of the accumulator.

In an implementation manner of this embodiment, the adjustment formula corresponding to the quantization scaling factor is:

wherein l_cWhich represents the maximum value of the learning rate,

the real-time learning rate is shown, bs represents the number of batch training samples, and gamma is an updating speed adjusting factor. In one implementation, γ may be set to 10 by default.

For example, the following steps are carried out: assuming that the preset network model is a convolutional neural network model VGG-16, the target data includes input data, a weight parameter and a bias parameter, and different bits are quantized, the weight parameter, the input data, the bias parameter, and a fixed-point multiplier (multiplier, which is used for integer inference and uses fixed-point multiplication and shift operation to replace unique floating-point multiplication) are all constrained by the constraint digit of the accumulator, and the obtained training result is shown in table 1. Wherein, the precision loss of 16 bits and 8 bits is less than 1 percent, and the precision loss of 4 bits is less than 5 percent.

Table 1: vgg-16 model overflow perception quantification training result

In summary, the overflow-aware quantization model training method provided by the invention can perform quantization training under the condition that the register bit number is limited, can completely avoid the occurrence of register overflow, and hardly causes performance loss of the 16-bit and 8-bit models obtained by training.

Based on the above overflow-aware quantization model training method, this embodiment provides an overflow-aware quantization model training apparatus, as shown in fig. 4, the apparatus includes:

a quantization module 100, configured to perform quantization processing on target data of a network layer in a preset network model, and determine output data of the network layer based on the quantized target data and the network layer, where the target data includes at least one of input data and network parameters of the network layer;

a sensing module 200, configured to determine an overflow element number corresponding to the output data, and adjust a quantization scaling coefficient corresponding to the network layer based on the overflow element number to adjust a quantization scaling factor corresponding to the network layer, where the overflow element number is a number that is smaller than a minimum value that can be represented by a preset number of bits or larger than a maximum value that can be represented by the preset number of bits in the output data.

Based on the overflow aware quantization model training method, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the overflow aware quantization model training method according to the above embodiment.

Based on the above overflow sensing quantization model training method, the present application further provides a terminal device, as shown in fig. 5, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the mobile terminal are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A quantitative overflow perception model training method is characterized by comprising the following steps:

2. The overflow aware quantization model training method of claim 1, wherein the preset number of bits is a constraint number of bits used to run an accumulator of the network layer.

3. The overflow-aware quantization model training method according to claim 1, wherein the quantizing the target data of the network layer in the preset network model specifically comprises:

4. The overflow-aware quantization model training method according to claim 1, wherein the network layer is configured with a quantization scaling coefficient, the obtaining a quantization range corresponding to target data, and the determining a quantization scaling factor of the target data based on the quantization range specifically includes:

5. The overflow aware quantization model training method of claim 4, wherein the candidate quantization scaling factor is calculated by:

6. The overflow aware quantization model training method of claim 1, wherein the adjustment formula corresponding to the quantization scaling factor is:

wherein l_cWhich represents the maximum value of the learning rate,

7. The overflow aware quantization model training method of claim 1, wherein the network layer is a convolutional layer that merges batch normalization layers.

8. An overflow aware quantitative model training device, comprising:

9. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for spill-aware quantization model training according to any one of claims 1-7.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the spillover aware quantitative model training method of any of claims 1-7.