CN116127317A

CN116127317A - Quantization parameter updating method, quantization parameter updating device, electronic equipment and readable storage medium

Info

Publication number: CN116127317A
Application number: CN202310066250.4A
Authority: CN
Inventors: 陈腊梅; 王凡祎
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2023-01-16
Filing date: 2023-01-16
Publication date: 2023-05-16

Abstract

The application discloses a quantization parameter updating method, a quantization parameter updating device, electronic equipment and a readable storage medium, wherein the quantization parameter updating method comprises the following steps: obtaining a model to be quantized; inserting the weight of the model to be quantized into a first pseudo quantization node with a weight quantization parameter to be updated; inserting a second pseudo quantization node with updated activation quantization parameters into an activation layer of the model to be quantized; taking a model to be quantized provided with a first pseudo quantization node and a second pseudo quantization node as a target model to be quantized, and obtaining a calculation result of sample data through the target model to be quantized as first data; updating the weight quantization parameter to be updated based on the difference between the first data and the target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data. In the embodiment of the application, the second pseudo quantization node with the updated activation quantization parameter is inserted into the activation layer, so that only the weight quantization parameter to be updated is required to be updated, and the memory requirement is reduced.

Description

Quantization parameter updating method, quantization parameter updating device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to a quantization parameter updating method, apparatus, electronic device, and readable storage medium.

Background

Currently, with the development of artificial intelligence technology, models can be trained using quantized perceptual training with learnable quantized parameters. However, at present, the quantization sensing training with learnable quantization parameters increases the memory requirement in the training process, resulting in shortage of memory resources.

Disclosure of Invention

The application provides a quantization parameter updating method, a quantization parameter updating device, electronic equipment and a readable storage medium, so as to reduce the memory requirement in the training process of a model.

In a first aspect, an embodiment of the present application provides a quantization parameter updating method, where the method includes: obtaining a model to be quantized; inserting a first pseudo quantization node with a weight quantization parameter to be updated into the weight of the model to be quantized; inserting a second pseudo quantization node with updated activation quantization parameters into the activation layer of the model to be quantized; taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and obtaining a calculation result of sample data through the target model to be quantized as first data; updating the weight quantization parameter to be updated based on the difference between the first data and target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data.

In a second aspect, an embodiment of the present application further provides a quantization parameter updating apparatus, where the apparatus includes: the device comprises an acquisition unit, a first insertion unit, a second insertion unit, a calculation unit and an updating unit. The acquisition unit is used for acquiring the model to be quantized; the first inserting unit is used for inserting the weight of the model to be quantized into a first pseudo quantization node with a weight quantization parameter to be updated; a second inserting unit, configured to insert a second pseudo quantization node with updated activation quantization parameters into the activation layer of the model to be quantized; the computing unit is used for taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and obtaining a computing result of sample data through the target model to be quantized as first data; and the updating unit is used for updating the weight quantization parameter to be updated based on the difference between the first data and the target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data.

In a third aspect, an embodiment of the present application further provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of the first aspect.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored therein program code that is callable by a processor to perform the method of the first aspect described above.

The method inserts a first pseudo quantization node with a quantization parameter of a weight to be updated into the weight of the model to be quantized; and inserting a second pseudo quantization node with updated active quantization parameters into the active layer of the model to be quantized. Therefore, the weight quantization parameter to be updated can be updated based on the difference between the first data and the target data, and the target weight quantization parameter is obtained. Since the quantization parameters corresponding to the activation are updated, the memory requirement is high when the model to be quantized is trained. In the embodiment provided by the application, the activation layer inserts the second pseudo quantization node with the updated activation quantization parameter, so that only the weight quantization parameter to be updated needs to be updated, and the memory requirement when the model to be quantized is trained can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows an application scenario diagram of a quantization parameter updating method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for updating quantization parameters according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for updating quantization parameters according to another embodiment of the present application;

fig. 4 is a block diagram showing a structure of a quantization parameter updating apparatus according to an embodiment of the present application;

fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application;

FIG. 6 shows a block diagram of a computer-readable storage medium provided by an embodiment of the present application;

fig. 7 shows a block diagram of a computer program product provided by an embodiment of the present application.

Detailed Description

In order to better understand the embodiments of the present application, the following description will clearly and completely describe the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

Although the computational effort required to train a model is continually being compressed as artificial intelligence technology evolves, the computational effort is still significant for mobile terminals, which may be, for example, hundreds of MFLOPs. Because the weight parameter range of each layer is basically determined and the fluctuation is not large, the method is suitable for quantization, and the memory requirement and the calculated amount can be reduced through quantization.

Quantization, among other things, is used in convolutional neural networks in a large number as a compression technique. One example, the memory requirements and computational requirements may be reduced by quantizing both the weight and activation of an otherwise 32-bit floating-point number in the model to a low-bit fixed-point number, thereby replacing the original floating-point number matrix multiplication operation with a low-bit fixed-point matrix multiplication operation.

In the prior art, quantization can be classified into Post-Quantization (Post-Training Quantization) and Quantization-aware Training (Quantization-aware Training) according to whether fine tuning (Quantization) is performed after Quantization. Therefore, the model to be trained can be trained through the quantized perception training, so that the quantization parameters corresponding to the weights in the trained model and the quantization parameters corresponding to the activation layer are updated.

However, the inventor finds that, in the research, the current quantization perception training with the learnable quantization parameters can increase the requirement for the memory in the training process of the model, so that the memory resource is tense.

Accordingly, in order to overcome the above-mentioned drawbacks, the present application provides a quantization parameter updating method, apparatus, electronic device, and readable storage medium.

Referring to fig. 1, fig. 1 shows an application scenario diagram of a quantization parameter updating method, namely a quantization parameter updating scenario 100, where the quantization parameter updating scenario 100 includes an electronic device 110 and a server 120, and the electronic device 110 is connected to the server 120.

The electronic device 110 may establish a connection with a server 120 that is also internet-accessed by accessing the internet. The electronic device 110 may access the internet in a wireless manner, for example, access the internet through a wireless communication technology Wi-Fi, bluetooth, etc.; the electronic device 110 may also access the internet by wired means, for example by Rj45 network cable or fiber optic.

The user may control the electronic device 110 so that the electronic device performs the quantization parameter updating method, and the detailed description will refer to the following embodiments. For example, the user may directly operate the electronic device 110, thereby controlling the electronic device to perform the quantization parameter updating method; the user may also operate the server 120 that has established a communication connection with the electronic device 110, thereby controlling the electronic device to perform the quantization parameter updating method through the server 120. The server 120 may be a cloud server or a local server.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for updating a quantization parameter according to an embodiment of the present application, where the method for updating a quantization parameter may be applied to the electronic device 110 illustrated in fig. 1, and specifically may use a processor in the electronic device 110 as an execution body. The quantization parameter updating method specifically includes steps S110 to S150.

Step S110: and obtaining a model to be quantized.

In reality, various things to be identified can be encountered, and each embodiment has different requirements on identification, so that a model to be quantized can be constructed according to the needs. The model to be quantized may be designed according to tasks to be executed, for example, the quantized model may be a convolutional neural network structure. The model to be quantized can comprise a forward propagation diagram and a backward propagation diagram of task design which are executed according to the requirement.

Step S120: and inserting the weight of the model to be quantized into a first pseudo quantization node with a weight quantization parameter to be updated.

Step S130: inserting a second pseudo quantization node with updated active quantization parameters into the active layer of the model to be quantized.

In some embodiments, a pseudo quantization node may be inserted for the model to be quantized, thereby implementing quantized perceptual training of the model to be quantized. Specifically, quantization parameters at which pseudo quantization nodes are inserted may be adjusted through quantization perception training. For example, if an activation quantization parameter is inserted at the activation layer and a weight quantization parameter is inserted at the weight, the activation quantization parameter and the weight quantization parameter may be updated at the time of the quantized perceptual training.

However, during the quantization perception training with learnable quantization parameters, a pseudo quantization node with learnable quantization parameters is generally inserted for the weights and the activations in the model to be quantized, so that the update of the weight quantization parameters corresponding to the weights and the update of the activation quantization parameters corresponding to the activations are realized, specifically, the update of the weight quantization parameters and the update of the activation quantization parameters are respectively performed each time the model to be quantized is counter-propagated. However, in parameter-learnable quantized perceptual training, in order to enable the network to reverse transfer, it is necessary to save data before entering the pseudo quantization node and data after the pseudo quantization node. Because the pseudo quantization nodes are inserted for the weight and the activation, data before the first pseudo quantization node corresponding to the weight and data after the first pseudo quantization node corresponding to the weight are required to be stored during the reverse transfer; the data before entering the second pseudo quantization node corresponding to the activation and the data after entering the second pseudo quantization node corresponding to the activation are also required to be saved.

For example, if the memory required for the training task of the model to be quantized is 17GB, the memory required for performing the parameter-learning quantized perceptual training for the model to be quantized may be 35GB.

The memory required to train the active quantization parameters is larger than the training of the weight quantization parameters, so in order to reduce the memory requirement in the quantized perceptual training, a first pseudo quantization node with the weight quantization parameters to be updated may be inserted at the time of the pseudo quantization node insertion, while a second pseudo quantization node with updated active quantization parameters may be inserted at the active layer.

Experiments show that the first precision of model output is obtained by performing quantization perception training on the model to be quantized through activating and inserting a second pseudo quantization node with updated activation quantization parameters; and inserting a second pseudo quantization node of the activated quantization parameter to be trained at the activation position, updating the activated quantization parameter to be trained, and performing quantized perception training on the model to be quantized to obtain a second precision of model output. Wherein the first accuracy does not vary much from the second accuracy. It is thus seen that by inserting a second pseudo quantization node with updated active quantization parameters in the active layer, no degradation of the model output accuracy is caused.

The pseudo quantization node is a node inserted in the quantization perception training of the model to be quantized, and is used for searching network data distribution and feeding back loss precision. Finding the distribution of network data, namely finding the maximum value and the minimum value of parameters to be quantized; the loss of precision when the analog is quantized into low bits is applied to a network model and transmitted to a loss function, so that an optimizer optimizes the loss value in the training process. Quantization and perception training is a process of simplifying a training model by replacing high-precision data with low-precision data on the basis of training. This procedure inevitably introduces a loss of accuracy, where a pseudo-quantization node is used to simulate the introduced loss of accuracy and the loss of accuracy is reduced by back propagation learning.

For example, a dummy quantization node is inserted into the model to be quantized, and the data inserted into the position of the dummy quantization node can be quantized and dequantized by a quantization parameter corresponding to the dummy quantization node. If a first pseudo quantization node is inserted into the weight of the model to be quantized, the first pseudo quantization node may include a weight quantization parameter to be updated, so that the weight may be quantized and dequantized by the weight quantization parameter to be updated; if a second pseudo quantization node is inserted at the active layer, the second pseudo quantization node may include updated active quantization parameters so that the activation may be quantized and dequantized by the active quantization parameters.

Step S140: and taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and acquiring a calculation result of sample data through the target model to be quantized as first data.

It will be appreciated that there is a loss of accuracy after the parameters or data are quantized and dequantized by the quantization parameters. Therefore, the first pseudo quantization node and the second pseudo quantization node are arranged on the model to be quantized, so that the accuracy loss of quantization and inverse quantization can be considered in the training process of the model to be quantized, the weight and the quantization parameter of the weight to be trained can be adjusted in the inverse transmission process, and the accuracy of the target model obtained after training can be improved.

Thus, the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node can be taken as a target model to be quantized, and then the calculation result of the sample data can be obtained as the first data by the target model. The sample data may be obtained from a database disclosed in the internet, and specific sample data may be determined according to a data type required to be processed by a task to be executed, which is not specifically limited in the embodiment of the present application.

Step S150: updating the weight quantization parameter to be updated based on the difference between the first data and target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data.

After the first data corresponding to the sample data is obtained through the target to-be-quantized model, standard data corresponding to the sample data can be obtained, and then the weight quantization parameter to be updated is updated through back propagation based on the difference between the first data and the sample data.

Since the second pseudo quantization node inserted in the active layer includes the updated active quantization parameter, the active quantization parameter may not be updated during the reverse transfer, so that the memory requirement may be reduced.

Optionally, in addition to updating the weight quantization parameter to be updated, the weight may be updated, so as to reduce the difference between the first data and the target data.

The quantization parameter of the weight to be updated may have different quantization strategies, for example, the quantization strategy may be symmetric quantization or asymmetric quantization. For the embodiments provided herein, when the quantization policy of the weight quantization parameter to be updated is symmetric quantization, the weight quantization parameter to be updated may include a quantization step size. I.e. the weights can be quantized as well as dequantized by different quantization steps. When the quantization strategy of the weight quantization parameter to be updated is asymmetric quantization, the weight quantization parameter to be updated may include a quantization step size and a quantization bias.

Some numbers may be quantized and dequantized by the parameters to be quantized. For asymmetric quantization, the quantized number may be expressed as, for example

Where r is the number to be quantized, for example, may be a floating point number; s is the quantization step; z is quantization bias; round is a rounding function. Further, the quantized number may be dequantized, r ₀ =s (q-z). Wherein r is ₀ I.e., the number obtained by dequantizing q, may be a floating point number.

According to the quantization parameter updating method, the first pseudo quantization node with the quantization parameter of the weight to be updated is inserted into the weight of the model to be quantized; and inserting a second pseudo quantization node with updated active quantization parameters into the active layer of the model to be quantized. Therefore, the weight quantization parameter to be updated can be updated based on the difference between the first data and the target data, and the target weight quantization parameter is obtained. Since the quantization parameters corresponding to the activation are updated, the memory requirement is high when the model to be quantized is trained. In the embodiment provided by the application, the first pseudo quantization node with the weight quantization parameter to be updated is inserted into the weight, and the second pseudo quantization node with the updated activation quantization parameter is inserted into the activation layer, so that when the model to be quantized performs back propagation each time, only the weight quantization parameter to be updated is required to be updated, and the activation quantization parameter is not required to be updated, the requirement on a memory can be reduced, and the problem of shortage of memory resources is relieved. Furthermore, the calculation quantity is reduced, and the corresponding quantization parameters do not need to be updated and activated in the training process, so that the calculation quantity is reduced, and the training is accelerated.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for updating a quantization parameter according to an embodiment of the present application, where the method for updating a quantization parameter may be applied to the electronic device 110 illustrated in fig. 1, and specifically may use a processor in the electronic device 110 as an execution body. The quantization parameter updating method specifically includes steps S210 to S270.

Step S210: and obtaining a model to be quantized.

Step S220: and inserting the weight of the model to be quantized into a first pseudo quantization node with a weight quantization parameter to be updated.

Step S230: inserting a second pseudo quantization node with updated active quantization parameters into the active layer of the model to be quantized.

Step S240: and taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and acquiring a calculation result of sample data through the target model to be quantized as first data.

It is readily understood that the updated active quantization parameters may be obtained before inserting the second pseudo quantization node for the model to be quantized. Specifically, the initial quantization parameter may be determined by some algorithm as an updated active quantization parameter. For example, the initial quantization parameter may be determined by means of a minimum mean square error approach (Minimum Mean Squared Error, MMSE); the initial quantization parameter can be determined by a maximum value mapping mode; the initial quantization parameter can also be determined by a saturated cut-off mapping mode; and the initial quantization parameter may also be determined by means of relative entropy.

Alternatively, a plurality of initial quantization parameters may be determined in a plurality of ways, and a more reasonable one of the initial quantization parameters may be selected as the initial quantization parameter.

For a detailed description of step S210 to step S240, reference may be made to the description of the foregoing embodiments, and the detailed description is omitted here.

Step S250: and quantizing and dequantizing the weight through the weight quantization parameter to be updated to obtain second data.

Step S260: and obtaining the derivative of the second data on the weight quantization parameter to be updated.

Step S270: and updating the derivative based on the difference between the first data and the target data to obtain a target weight quantization parameter.

In some embodiments, the adjustment of the weight quantization parameter to be updated may be achieved by updating a derivative of the second data with respect to the weight quantization parameter to be updated. The second data may be obtained by quantizing and dequantizing the weight according to a weight quantization parameter to be updated.

Optionally, step S251 and step S252 may also be included when step S250 is performed.

Step S251: and quantizing the weight through the weight quantization parameter to be updated to obtain an intermediate number.

Step S252: and dequantizing the intermediate number through the weight quantization parameter to be updated to obtain the second data.

In the process of training the model to be quantized, the weight quantization parameter to be updated can be updated through back propagation for a plurality of times. Specifically, in each counter-propagation process, the derivative of the weight quantization parameter to be updated of the second data obtained by quantizing and inverse quantizing the weight through the weight quantization parameter to be updated can be obtained, and the derivative is updated, so that training of the model to be quantized is realized.

Illustratively, quantizing the weights by the weight quantization parameter to be updated may be represented by the following formula:

and the intermediate number is dequantized through the weight quantization parameter to be updated, so that the second data can be represented by the following formula:

wherein v is the input of the floating point type, namely the weight;

the data obtained after the quantization of v can be real, namely the intermediate number; />

Is to->

The data obtained by inverse quantization is the second data; QN and QP are the minimum and maximum, respectively, of the quantization values in the quantization range, for example if in symmetric quantization QN and QP are typically equal; s is the weight quantization parameter to be updated, e.g. in symmetric quantization s may be the quantization step size.

Further, a derivative of the second data with respect to the weight quantization parameter to be updated may be obtained, so that the derivative may be updated based on a difference between the first data and the target data, to obtain the target weight quantization parameter.

The target weight quantization parameter may be a weight quantization parameter to be updated obtained after updating the derivative is stopped when the derivative is updated and the specified condition is satisfied.

Further, the second data may be further processed

Unifying to obtain->

Thereby making it

The second data may then be found

The derivative of the quantization parameter s for the weight to be updated yields the following equation:

the round function is not differentiable, and the local differentiable part gradients are all 0, so that the back propagation algorithm cannot update the parameters. Therefore, according to the method (Straight Through Estimator, STE), the gradient can be directly returned to the weight before round, so that the weight can be updated, and the training of the model to be quantized can be normally performed.

An exemplary assumption is that the quantization range of the weight quantization parameter to be updated is fixed at [0,3]Interval, i.e. let the above Q _N ＝0；Q _P =3; at this time, the initial s=1 may be set. Since the round function is rounded off, there will be abrupt changes at the designated points, which may be, for example, 0.5, 1.5 and 2.5, in the second data that has been dequantized after quantization by the weight quantization parameter to be updated. Thus, the derivative should also correspond to a large change at a given point. Based on the obtained derivatives, the derivatives at the designated points have larger changes, so that the quantization perception training is reasonable to update the weight quantization parameters to be updated.

Alternatively, since the above derivative is updated by reverse transfer, the weight of the model to be quantized can also be updated by reverse transfer. Therefore, the weight quantization parameter to be updated can also be constrained by the scaling factor, so that the difference between the weight quantization parameter to be updated and the weight is not excessive. One example, the constraint on the quantization parameter of the weight to be updated may be given by the following formula.

Wherein, the liquid crystal display device comprises a liquid crystal display device,

the method comprises the steps of obtaining a derivative of a weight quantization parameter to be updated for L; wherein L may be used to characterize the loss function (LossFunction); II w II is used to characterize the pair w for l ₂ Norm, i.e. l ₂ Is a regularization of (2). Wherein, the closer R is to 1, namely the average update amplitude of the quantization parameter of the weight to be updated in reverse transfer is approximately the same as the average update amplitude of the weight, which is more beneficial to realizing stable convergence. The average update amplitude of the weight quantization parameter to be updated can be regarded as the average value updated by the weight quantization parameter to be updated through reverse transfer for a plurality of times; similarly, the average update amplitude for a weight can also be seen as the average of the weight updates over multiple passes of reverse transfer.

However, through implementation, for some models to be quantized, the corresponding R is relatively large, that is, the updating amplitude of the quantization parameter of the weight to be updated of the models to be quantized is relatively large, which easily causes repeated overshoot of the local minimum value, and prolongs the convergence time.

Therefore, optionally, the weight quantization parameter to be updated may also be limited by the gradient scaling factor, so as to reduce the update amplitude of the weight quantization parameter to be updated. Specifically, the update amplitude of the weight quantization parameter to be updated may be reduced by multiplying the weight quantization parameter to be updated by the gradient scaling coefficient. The gradient scaling factor may be formulated, for example,

wherein, the liquid crystal display device comprises a liquid crystal display device,N _W the total number of elements of the matrix corresponding to the characterization weight.

If the following is used

It is possible to effectively reduce R. But here a large difference occurs for quantization of different bit positions. For example, R obtained under quantization of 2 bits and R obtained under quantization of 8 bits may differ by an order of magnitude. If it is used->

The method can ensure that the difference of each bit under quantization is not large, and has good inhibition effect, such as 2bit, 8bit and the like.

Further, when executing step S270, the method may further include: and if the derivative is updated to meet the specified condition based on the difference between the first data and the target data, stopping updating the derivative, and taking the updated weight quantization parameter to be updated as the target weight quantization parameter.

Wherein updating the derivative satisfies a specified condition, which may be preset. It is easily understood that different specified conditions may be preset according to different parameters. For example, a specified condition may be set according to the loss function, such as loss function convergence; in another exemplary embodiment, the specified condition may be set according to the number of iterations of the model to be quantized, for example, the number of iterations satisfies the target number; also exemplary, the specified condition may be set according to the number of rounds of the model to be quantized, for example, the number of rounds satisfying the target number of times.

Thus, in some embodiments, if the number of iterations of updating the derivative based on the difference between the first data and the target data satisfies the target number of iterations, updating the derivative is stopped, and the updated weight quantization parameter to be updated is taken as the target weight quantization parameter.

The determining of the target number may be determining, in advance, the number of iterations required for training the model until the loss function converges, and then taking the number as the target number. When the model to be quantized is trained subsequently, if the iteration number reaches the target number, the loss function is converged at the moment when the probability is high.

It is easy to understand that after the weight of the model to be quantized is inserted into the first pseudo quantization node, an initial value may be set for the quantization parameter of the weight to be updated, so that when the model to be quantized is subjected to quantization perception training, initial forward propagation may be performed based on the initial value of the quantization parameter of the weight to be quantized.

Similarly, the manner of setting the initial value for the weight quantization parameter to be updated may be determined based on at least one of a minimum mean square error manner, a maximum value mapping manner, a saturation truncation mapping manner, and a relative entropy manner, similarly to the manner of acquiring the updated activated quantization parameter.

Optionally, the model to be processed may be preprocessed before the model to be quantized is trained. By means of preprocessing, the time required for training the model to be quantized can be reduced.

According to the quantization parameter updating method, the first pseudo quantization node with the quantization parameter of the weight to be updated is inserted into the weight of the model to be quantized; and inserting a second pseudo quantization node with updated active quantization parameters into the active layer of the model to be quantized. Thereby, the weight is quantized and inversely quantized through the weight quantization parameter to be updated, and second data are obtained; acquiring the derivative of the second data on the weight quantization parameter to be updated; and updating the derivative based on the difference between the first data and the target data to obtain a target weight quantization parameter. Since the quantization parameters corresponding to the activation are updated, the memory requirement is high when the model to be quantized is trained. In the embodiment provided by the application, the first pseudo quantization node with the weight quantization parameter to be updated is inserted into the weight, and the second pseudo quantization node with the updated activation quantization parameter is inserted into the activation layer, so that when the model to be quantized performs back propagation each time, only the weight quantization parameter to be updated is needed to be updated, but the activation quantization parameter is not needed to be updated, and therefore, the activation quantization parameter does not need to be saved in back propagation, the requirement on a memory can be reduced, and the problem of shortage of memory resources is relieved. Furthermore, the calculation quantity is reduced, and the corresponding quantization parameters do not need to be updated and activated in the training process, so that the calculation quantity is reduced, and the training is accelerated. Furthermore, in the embodiment provided by the application, when the update of the derivative meets the specified condition, the update of the derivative is stopped, so that unnecessary training can be avoided.

Referring to fig. 4, fig. 4 is a block diagram illustrating a quantization parameter updating apparatus 400 according to an embodiment of the present application, where the quantization parameter updating apparatus 400 includes: the acquisition unit 410, the first insertion unit 420, the second insertion unit 430, the calculation unit 430, and the update unit 450.

An obtaining unit 410 is configured to obtain a model to be quantized.

The first inserting unit 420 is configured to insert, into the weights of the model to be quantized, a first pseudo quantization node having a quantization parameter of the weights to be updated.

A second inserting unit 430, configured to insert a second pseudo quantization node with a trained active quantization parameter into the active layer of the model to be quantized. Wherein the trained active quantization parameter is determined based on at least one of a minimum mean square error mode, a maximum value mapping mode, a saturation truncation mapping mode, and a relative entropy mode.

The calculating unit 440 is configured to take the model to be quantized, in which the first pseudo quantization node and the second pseudo quantization node are disposed, as a target model to be quantized, and obtain a result of calculating sample data as first data by using the target model to be quantized.

And an updating unit 450, configured to update the weight quantization parameter to be updated based on a difference between the first data and target data, to obtain a target weight quantization parameter, where the target data is standard data corresponding to the sample data. The quantization strategy of the weight quantization parameter to be updated comprises symmetric quantization, and the weight quantization parameter to be updated comprises a quantization step length.

Further, the updating unit 450 may be further configured to quantize and dequantize the weight according to the weight quantization parameter to be updated, so as to obtain second data; acquiring the derivative of the second data on the weight quantization parameter to be updated; and updating the derivative based on the difference between the first data and the target data to obtain a target weight quantization parameter.

Further, the updating unit 450 may be further configured to quantize the weight through the weight quantization parameter to be updated to obtain an intermediate number; and dequantizing the intermediate number through the weight quantization parameter to be updated to obtain the second data.

Further, the updating unit 450 may be further configured to stop updating the derivative if the derivative is updated to meet a specified condition based on a difference between the first data and the target data, and take the updated weight quantization parameter to be updated as the target weight quantization parameter.

Further, the updating unit 450 may be further configured to stop updating the derivative if the number of iterations for updating the derivative based on the difference between the first data and the target data meets the target number, and take the updated weight quantization parameter to be updated as the target weight quantization parameter.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided herein, the coupling of the units to each other may be electrical, mechanical, or other. In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Referring to fig. 5, fig. 5 shows a block diagram of an electronic device 110 according to an embodiment of the present application. The electronic device 110 may be a smart phone, a notebook computer, a desktop computer, a tablet computer, a wireless headset, etc. The electronic device 110 in this application may include one or more of the following components: a processor 111, a memory 112, and one or more application programs, wherein the processor 111 is electrically connected to the memory 112, the one or more program(s) being configured to perform the method as described in the various embodiments of the quantization parameter updating method described above.

Processor 111 may include one or more processing cores. The processor 111 connects various portions of the overall electronic device 110 using various interfaces and lines, performs various functions of the electronic device 110 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 112, and invoking data stored in the memory 112. Alternatively, the processor 111 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 111 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. Wherein, the CPU mainly processes an operating system, a user interface, a computer program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 111 and may be implemented solely by a communication chip. The method as described in the previous embodiments may be performed in particular by the one or more processors 111.

For some embodiments, memory 112 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Memory 112 may be used to store instructions, programs, code sets, or instruction sets. The memory 112 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described below, and the like. The storage data area may also store data created by the electronic device 110 in use, and the like.

Referring to fig. 6, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 600 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 600 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 600 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 600 has storage space for program code 610 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 610 may be compressed, for example, in a suitable form.

Referring to fig. 7, a block diagram 700 of a computer program product according to an embodiment of the present application is shown. The computer program product 700 comprises a computer program/instructions 710, which computer program/instructions 710, when executed by a processor, implement the steps of the method described above.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A quantization parameter updating method, the method comprising:

obtaining a model to be quantized;

inserting a first pseudo quantization node with a weight quantization parameter to be updated into the weight of the model to be quantized;

inserting a second pseudo quantization node with updated activation quantization parameters into the activation layer of the model to be quantized;

taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and obtaining a calculation result of sample data through the target model to be quantized as first data;

updating the weight quantization parameter to be updated based on the difference between the first data and target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data.

2. The method according to claim 1, wherein updating the weight quantization parameter to be updated based on the difference between the first data and the target data to obtain the target weight quantization parameter comprises:

quantizing and dequantizing the weight through the weight quantization parameter to be updated to obtain second data;

acquiring the derivative of the second data on the weight quantization parameter to be updated;

and updating the derivative based on the difference between the first data and the target data to obtain a target weight quantization parameter.

3. The method according to claim 2, wherein said quantizing and dequantizing said weights by said weight quantization parameter to be updated, obtaining second data, comprises:

quantizing the weight through the weight quantization parameter to be updated to obtain an intermediate number;

and dequantizing the intermediate number through the weight quantization parameter to be updated to obtain the second data.

4. The method of claim 2, wherein updating the derivative based on the difference between the first data and the target data to obtain the target weight quantization parameter comprises:

and if the derivative is updated to meet the specified condition based on the difference between the first data and the target data, stopping updating the derivative, and taking the updated weight quantization parameter to be updated as the target weight quantization parameter.

5. The method according to claim 4, wherein if the updating of the derivative based on the difference between the first data and the target data satisfies a specified condition, stopping the updating of the derivative, and taking the updated weight quantization parameter to be updated as the target weight quantization parameter, comprises:

and if the iteration number of updating the derivative meets the target number based on the difference between the first data and the target data, stopping updating the derivative, and taking the updated weight quantization parameter to be updated as the target weight quantization parameter.

6. The method according to any of claims 1-5, wherein the quantization strategy for the weight quantization parameter to be updated comprises symmetric quantization and the weight quantization parameter to be updated comprises quantization step size.

7. The method of claim 1, wherein the updated activation quantization parameter is determined based on at least one of a minimum mean square error approach, a maximum mapping approach, a saturation truncation mapping approach, and a relative entropy approach.

8. A quantization parameter updating apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the model to be quantized;

the first inserting unit is used for inserting the weight of the model to be quantized into a first pseudo quantization node with a weight quantization parameter to be updated;

a second inserting unit, configured to insert a second pseudo quantization node with updated activation quantization parameters into the activation layer of the model to be quantized;

the computing unit is used for taking the model to be quantized provided with the first pseudo quantization node and the second pseudo quantization node as a target model to be quantized, and obtaining a computing result of sample data through the target model to be quantized as first data;

and the updating unit is used for updating the weight quantization parameter to be updated based on the difference between the first data and the target data to obtain the target weight quantization parameter, wherein the target data is standard data corresponding to the sample data.

9. An electronic device, comprising: one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-7.