CN111783957A

CN111783957A - Model quantitative training method and device, machine-readable storage medium and electronic equipment

Info

Publication number: CN111783957A
Application number: CN202010634312.3A
Authority: CN
Inventors: 刘岩; 曲晓超; 姜浩; 杨思远; 万鹏飞
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2020-10-16
Anticipated expiration: 2040-07-02
Also published as: CN111783957B

Abstract

The application discloses a model quantitative training method, a device, a machine readable storage medium and an electronic device, wherein the method comprises the following steps: acquiring a network parameter of each layer to be quantized in the model to be quantized, wherein the network parameter is a weight value or an activation value in a network; judging whether the maximum value and the minimum value in the network parameters of the layer to be quantified are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not; if the maximum value and the minimum value in the network parameters of the layer to be quantified are equal or the maximum value is smaller than the parameter threshold, taking the sum of the maximum value and a preset value as a new maximum value; and carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain the target model. The method can solve the problem that the training cannot be performed when a certain layer is 0, the value of the certain layer is completely equal or the value of the certain layer is very close to 0 in the quantization training process, so that the model to be quantized can be normally quantized and trained.

Description

Model quantitative training method and device, machine-readable storage medium and electronic equipment

Technical Field

The application relates to the technical field of neural networks, in particular to a model quantitative training method and device, a machine readable storage medium and electronic equipment.

Background

With the rapid development of deep learning, the precision of the deep learning model is continuously improved. However, the more accurate the deep learning model, the more often high performance GPUs are required to obtain it. These deep learning models also require huge hardware resources to be consumed when applied, and are not suitable for mobile terminals and the like. At present, in order to solve the problem of applying a high-precision deep learning model to a mobile terminal, a model that can be used on the mobile terminal is usually obtained by quantizing the model, however, in the current model quantizing method, when network parameters of a certain layer in the model are equal, for example, all the network parameters are equal to 0, or parameters of a certain layer are very close to 0, a situation that the model cannot be quantized occurs.

Disclosure of Invention

In order to overcome at least the above-mentioned deficiencies in the prior art, one of the objectives of the present application is to provide a model quantitative training method, comprising:

acquiring a network parameter of each layer to be quantized in the model to be quantized, wherein the network parameter is a weight value or an activation value in a network;

judging whether the maximum value and the minimum value in the network parameters of the layer to be quantified are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;

if the maximum value and the minimum value in the network parameters of the layer to be quantified are equal or the maximum value is smaller than the parameter threshold, taking the sum of the maximum value and a preset value as a new maximum value;

and carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.

Optionally, the network quantization is performed according to the maximum value and the minimum value of the parameter in the layer to be quantized, and the process of obtaining the target model includes:

calculating the slope of the mapping according to the maximum value and the minimum value;

mapping the network parameters of the layer to be quantified into integers in a preset value interval according to the slope;

and inversely mapping the integer into a floating point number according to the slope to obtain a target model.

Optionally, the method for calculating the slope of the mapping according to the maximum value and the minimum value is:

wherein a is the minimum value of the network parameters in the layer to be quantized, b is the maximum value of the network parameters in the layer to be quantized, n is the number of integers in a preset value interval, and s (a, b, n) is the slope of mapping.

Optionally, when a maximum value and a minimum value in the network parameters of the layer to be quantized are equal, or the maximum value is smaller than the parameter threshold, the method for mapping the parameters of the layer to be quantized to integers q (r, a, b, n) within a preset value interval according to the slope includes:

clamp(r；a,b)：＝min(max(r,a),b)

b＝a+m

when the maximum value and the minimum value in the network parameters of the layer to be quantized are not equal and the maximum value is greater than or equal to the parameter threshold, the method for mapping the parameters of the layer to be quantized into integers within a preset value interval according to the slope comprises the following steps:

clamp(r；a,b)：＝min(max(r,a),b)

wherein r is a network parameter in the layer to be quantized, clamp (r; a, b) is a truncation function, q (r, a, b, n) is the network parameter after r is mapped to a preset numerical interval, n is the number of integers in the preset numerical interval, and s (a, b, n) is the slope of mapping.

Optionally, the method further comprises:

obtaining a plurality of training samples;

and inputting the training samples into a pre-trained original network model for network training to obtain the model to be quantized.

Optionally, the method further comprises:

acquiring data to be processed;

and inputting the data to be processed into the target model for data processing to obtain a processing result of the data to be processed.

Another object of the present application is to provide a model quantitative training apparatus, which includes:

the acquisition module is used for acquiring the network parameters of each layer to be quantized in the model to be quantized;

the judging module is used for judging whether the maximum value and the minimum value in the network parameters of the layer to be quantified are equal or not, or whether the maximum value is smaller than a preset parameter threshold value or not;

the adjusting module is used for taking the sum of the maximum value and a preset value as a new maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold value;

and the quantization module is used for carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.

Optionally, the quantization module is specifically configured to:

It is another object of the present application to provide a machine-readable storage medium storing an executable program which, when executed by a processor, implements a method as in any of the present applications.

Another object of the present application is to provide an electronic device, which includes a memory and a processor, the memory and the processor are electrically connected, the memory stores an executable program, and the processor, when executing the executable program, implements the method according to any of the present application.

Compared with the prior art, the method has the following beneficial effects:

in this embodiment, when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal, or the maximum value is smaller than the parameter threshold, the sum of the maximum value and the preset value is used as a new maximum value, so that the difference between the maximum value and the minimum value of the network parameters of the layer to be quantized is increased, thereby realizing quantization of the layer to be quantized, solving the problem that a certain layer is 0, the value of the certain layer is completely equal, or the value of the certain layer is very close to 0 but training is impossible in the quantization training process, and enabling the model to be quantized to be normally quantized and trained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a block diagram schematically illustrating a structure of an electronic device according to an embodiment of the present disclosure;

FIG. 2 is a first flowchart illustrating a model quantization method according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a model quantization method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of model training of a model quantization method provided in an embodiment of the present application;

fig. 5 is a block diagram schematically illustrating a structure of a model quantization apparatus according to an embodiment of the present application.

Icon: 100-an electronic device; 110-model quantitative training means; 111-an acquisition module; 112-a judgment module; 113-an adjustment module; 114-a quantization module; 120-a memory; 130-a processor.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is further noted that, unless expressly stated or limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

In the last few years, deep learning has been rapidly developed, and in 2012, Krizhevsky et al adopted a deep learning algorithm, and acquired the image classification competition champion with a great lead of 10% over the second adopted traditional manual design feature method. Computer vision games have since largely used a variety of deep learning models. These high-precision deep learning models rely on deep networks with hundreds or even billions of parameters, and traditional CPUs are devoting to such huge networks at a glance, and only GPUs with high computing power can make the training speed of the networks relatively fast. Furthermore, the use of these deep learning models is very hardware intensive.

For the mobile terminal, the hardware computing capacity and the available storage space of the mobile terminal are limited, and most deployed models of the mobile terminal are operated in real time, so that the convolutional neural network with a good effect in the current deep learning model is not suitable for being deployed on the mobile terminal. In order to apply the deep learning model trained by the convolutional neural network to the mobile terminal, a common way is to prune the model, reduce the number of layers of the network, or reduce the number of channels of each layer of the network, thereby reducing the size and the calculation amount of the model, but this reduces the performance of the deep learning model, and the effect is poor. In another mode, the deep learning model is quantized, that is, parameters of float type in the model are converted into int8 type, so that the size of the model can be reduced, and meanwhile, when the mobile terminal calculates the model, the operation of float is converted into the operation of int8, so that acceleration is realized.

However, when the deep learning model is quantized, all parameters of a certain layer in the model are equal, for example, all parameters are 0, or all parameters of a certain layer are close to 0, so that the deep learning model cannot be quantized.

In order to solve the problem that the deep learning model cannot be quantized in the prior art, an embodiment of the present application provides an electronic device 100.

Referring to fig. 1, fig. 1 is a schematic block diagram of a structure of an electronic device 100 according to an embodiment of the present disclosure, where the electronic device 100 includes a model quantitative training apparatus 110, a memory 120 and a processor 130, and the memory 120 and the processor 130 are electrically connected to each other directly or indirectly for implementing data interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The model quantitative training device 110 includes at least one software functional module which can be stored in the memory 120 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the electronic device 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the model quantification training device 110.

The electronic device 100 in this embodiment may be a mobile terminal, such as a mobile phone.

In order to solve the problem that the deep learning model cannot be quantized in the prior art, an embodiment of the present application further provides a model quantization training method applicable to the electronic device 100, and please refer to fig. 2, where fig. 2 is a schematic flow diagram of the model quantization training method provided in the embodiment of the present application. The method comprises steps S110-S140.

Step S110, for each layer to be quantized in the model to be quantized, obtaining a network parameter of the layer to be quantized, where the network parameter is a weight value or an activation value in the network.

Step S120, determining whether the maximum value and the minimum value in the network parameters of the layer to be quantized are equal, or whether the maximum value is smaller than a preset parameter threshold.

Step S130, if the maximum value and the minimum value in the network parameters of the layer to be quantified are equal, or the maximum value is smaller than the parameter threshold, the sum of the maximum value and a preset value is taken as a new maximum value.

For example, when the minimum value a is equal to the minimum value b, then the new maximum value b is a + m, where m is a preset number.

And step S140, carrying out network quantization according to the maximum value and the minimum value of the parameters in the layer to be quantized to obtain a target model.

In this embodiment, the model to be quantized may be a convolutional neural network model, where the layer to be quantized may be, but is not limited to, convolutional layer conv, or an activation layer (e.g., Rule6 layer), and the layer to be quantized may also be other network layers that cannot be fused.

In this embodiment, when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case where both the maximum value and the minimum value are 0), or the maximum value is smaller than the parameter threshold, the sum of the maximum value and the preset value is used as a new maximum value, so that the difference between the maximum value and the minimum value of the network parameters of the layer to be quantized is increased, thereby realizing quantization of the layer to be quantized, and solving the problem that a certain layer is completely equal in value in the quantization training process, such as a certain layer is 0, or the value of a certain layer is very close to 0, but training is not possible, so that the model to be quantized can be normally quantized and trained.

In addition, the present embodiment also enables models with some redundancy to also simulate quantitative training and use the Int8 fixed point forward framework.

Referring to fig. 3, optionally, in this embodiment, the step S140 includes sub-steps S141 to S143.

In step S141, the slope of the map is calculated from the maximum value and the minimum value.

Step S142, mapping the network parameter of the layer to be quantized to an integer within a preset value interval according to the slope.

Specifically, in this embodiment, the floating-point network parameter of the layer to be quantized may be mapped to a value within a preset value interval, and then the mapped network parameter is rounded, and a rounding method may be adopted in rounding. In this embodiment, the quantization loss of the Int8 vertex forward frame can be reduced by converting the floating point number (floating point type data) into an integer number (integer type data) and then converting the floating point number into the floating point number, and simulating the quantization loss of the forward frame calculation process through this process.

And S143, inversely mapping the integer to a floating point number according to the slope, and obtaining the target model.

The process of inverse mapping the integer to the floating point number according to the slope is opposite to the process of mapping the floating point number to the preset numerical value interval (before rounding), and is not described herein again. The slope of the mapping is the mapping proportion, that is, the ratio of the corresponding numerical value of the network parameter in the layer to be quantified in the mapped interval to the network parameter. In this embodiment, the slope of the mapping is calculated according to the maximum value and the minimum value, and since the maximum value is adjusted when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case where both the maximum value and the minimum value are 0) or the maximum value is smaller than the parameter threshold, the slope of the mapping can be calculated according to the minimum value and the adjusted maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal (including the case where both the maximum value and the minimum value are 0) or the maximum value is smaller than the parameter threshold, so that the quantization of the network can be performed according to the calculated slope, so that the model to be quantized can perform normal quantization training, and negligible errors can be brought.

In this embodiment, the preset value m may be set according to actual needs, so as to calculate the slope for mapping, for example, m may be set to 1.

For example, in the present embodiment, the Int8 fixed point forward framework may be employed to convert floating point arithmetic operations into integer operations. In this case, the number of integers in the preset value interval is 256, for example, the preset value interval may be [0, 255], and in this case, n is 256.

Optionally, in this embodiment, the method for calculating the slope of the mapping according to the maximum value and the minimum value includes:

When the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the calculation formula of s (a, b, n) is as follows:

for example, when m is 1, the calculation formula of s (a, b, n) is as follows:

optionally, in this embodiment, the method of mapping the parameter of the layer to be quantized to the integer q (r, a, a + m, n) within the preset value interval according to the slope is divided into two cases.

When the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, q (r, a, b, n) is calculated in the following manner.

clamp(r；a,b)：＝min(max(r,a),b)

b＝a+m

In an embodiment where m is 1, the formula for q (r, a, b, n) is:

clamp(r；a,a+1)：＝min(max(r,a),a+1)

when the maximum value and the minimum value in the network parameters of the layer to be quantized are not equal and the maximum value is greater than or equal to the parameter threshold, q (r, a, b, n) is calculated in the following manner.

clamp(r；a,b)：＝min(max(r,a),b)

r is to quantify the network parameters in the layer, clamp (r; a, b) is a truncation function, q (r, a, b, n) is the network parameters after r is mapped to a preset value interval, and s (a, b, n) is the slope of mapping.

In this embodiment, in the process of performing model quantization, the mapped network parameters are rounded first, and then the rounded network parameters are inversely mapped to floating point numbers, so that the forward quantization loss can be simulated in training, thereby achieving the purpose of reducing the quantization loss of the forward frame.

In this embodiment, when the activation value of the analog quantity model is a, a is the minimum value of the activation values in the activation layer, and b is the maximum value of the activation values in the activation layer. When the weight value of the analog quantity model is a minimum value of the weight value in the layer to be quantified, b is a maximum value of the weight value in the layer to be quantified.

Referring to FIG. 4, for example, in a model to be quantized including a convolution operation, an addition operation, and a Relu6 operation, the input of the model is input, the output of the model is output, the bias is bias, and weights need to be quantized in an analog manner, so that an analog quantization operation (wt quant) is added after weight; the convolution conv outputs the activation value that needs analog quantization, and adds an analog quantization operation (act quant) after the conv operation (corresponding to the symbol conv quant in fig. 4); the sum layer and the ReLU6 layers may be merged into one layer, and the active values output by the sum layer are not subjected to analog quantization, so that an analog quantization operation is added to the active values output by the ReLU6, such as the ReLU6 followed by an analog quantization operation (act quant) in fig. 4.

Optionally, in this implementation, the method further includes obtaining a plurality of training samples; and inputting a plurality of training samples into a pre-trained original network model for network training to obtain a model to be quantized.

In this embodiment, the original network model is trained by using the training samples, so as to obtain a model to be quantized.

Optionally, in this embodiment, the method further includes acquiring data to be processed; and inputting the data to be processed into the target model for data processing to obtain a processing result of the data to be processed.

In this embodiment, the trained target model is used to process the data to be processed, and thus, because the target model is a quantized deep learning model, the model has low consumption of hardware resources, can quickly and accurately process the data to be processed, and can be applied to devices such as a mobile terminal.

Referring to fig. 5, an embodiment of the present application further provides a model quantitative training apparatus 110, which includes an obtaining module 111, a determining module 112, an adjusting module 113, and a quantizing module 114. The model quantitative training device 110 includes a software function module which can be stored in the memory 120 in the form of software or firmware or solidified in an Operating System (OS) of the electronic device 100.

An obtaining module 111, configured to obtain, for each layer to be quantized in the model to be quantized, a network parameter of the layer to be quantized.

The obtaining module 111 in this embodiment is used to execute step S110, and the detailed description about the obtaining module 111 may refer to the description about step S110.

The determining module 112 is configured to determine whether a maximum value and a minimum value in the network parameters of the layer to be quantized are equal to each other, or whether the maximum value is smaller than a preset parameter threshold.

The determining module 112 in this embodiment is used to execute step S120, and the detailed description about the determining module 112 may refer to the description about step S120.

And the adjusting module 113 is configured to take the sum of the maximum value and a preset value as a new maximum value when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal to each other or the maximum value is smaller than a parameter threshold.

The adjusting module 113 in this embodiment is used to execute step S130, and the detailed description about the adjusting module 113 may refer to the description about step S130.

And the quantization module 114 is configured to perform network quantization according to the maximum value and the minimum value of the parameter in the layer to be quantized, so as to obtain the target model.

The quantization module 114 in this embodiment is used to execute step S140, and the detailed description about the quantization module 114 may refer to the description about step S140.

Optionally, in this implementation, the quantization module 114 is specifically configured to calculate a slope of the mapping according to the maximum value and the minimum value; mapping the network parameters of the layer to be quantified into integers in a preset numerical value interval according to the slope; and inversely mapping the integer into a floating point number according to the slope to obtain the target model.

The embodiment of the present application further provides a machine-readable storage medium, in which an executable program is stored, and when the executable program is executed, the processor 130 implements the method according to any one of the embodiments.

The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for model quantization training, the method comprising:

2. The method according to claim 1, wherein the network quantization according to the maximum value and the minimum value of the parameter in the layer to be quantized, and the obtaining of the target model comprises:

3. The method of claim 2, wherein the calculating the slope of the map from the maximum and minimum is by:

4. The method according to claim 3, wherein when the maximum value and the minimum value in the network parameters of the layer to be quantized are equal or the maximum value is smaller than the parameter threshold, the method for mapping the parameters of the layer to be quantized to integers q (r, a, b, n) within a preset value interval according to the slope is as follows:

clamp(r；a，b)：＝min(max(r，a)，b)

b＝a+m

clamp(r；a，b)：＝min(max(r，a)，b)

5. The method according to any one of claims 1-4, further comprising:

obtaining a plurality of training samples;

6. The method according to any one of claims 1-4, further comprising:

acquiring data to be processed;

7. An apparatus for model quantization training, the apparatus comprising:

8. The apparatus of claim 7, wherein the quantization module is specifically configured to:

9. A machine readable storage medium, characterized in that the machine readable storage medium stores an executable program which, when executed by a processor, implements the method according to any one of claims 1-6.

10. An electronic device, comprising a memory and a processor, the memory and the processor being electrically connected, the memory having stored therein an executable program, the processor, when executing the executable program, implementing the method of any one of claims 1-6.