CN116205281A

CN116205281A - Network model quantization method, device, computer equipment and storage medium thereof

Info

Publication number: CN116205281A
Application number: CN202211647925.6A
Authority: CN
Inventors: 胡中华; 陈炫憧; 江文杰
Original assignee: Beijing Signalway Technologies Co ltd
Current assignee: Beijing Signalway Technologies Co ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-06-02

Abstract

The present disclosure relates to the field of model training technologies, and in particular, to a network model quantization method, a network model quantization device, a computer device, and a storage medium thereof. The method comprises the following steps: respectively carrying out quantization training on the pre-training model according to at least two random quantization accuracies and preset quantization accuracies to obtain a random quantization model corresponding to each random quantization accuracy and a fixed quantization model corresponding to the fixed quantization accuracy; determining a first quantization model from each random quantization model according to a prediction error between each random quantization model and the fixed quantization model; carrying out quantization training on an actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model, wherein the actual training model is obtained by training a pre-training model based on sample operation data; and carrying out knowledge distillation training on the second quantization model according to the actual training model to obtain a quantization network model. The method reduces the operation difficulty and reduces the training times required for determining the quantitative network model.

Description

Network model quantization method, device, computer equipment and storage medium thereof

Technical Field

The present disclosure relates to the field of model training technologies, and in particular, to a network model quantization method, a network model quantization device, a computer device, and a storage medium thereof.

Background

With the continuous development of artificial intelligence technology, the more complex the structure of the network model is designed, the more huge the parameters of the network model are designed, so the configuration requirements for hardware devices running the network model are higher.

In the prior art, in order to ensure that the hardware equipment can stably operate the network model, the network model needs to be compressed so as to reduce the precision in the network model, thereby ensuring that the hardware equipment cannot influence the calculation efficiency of the network model due to the too low configuration of the hardware equipment in the process of operating the network model.

However, multiple times of training are required to be performed on the network model in the process of performing compression processing on the network model, so that a method for performing compression processing on the network model in the prior art is complicated, and the efficiency of performing compression processing on the network model is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a network model quantization method, apparatus, computer device, and storage medium thereof that can improve the efficiency of quantization of a network model.

In a first aspect, the present application provides a network model quantization method. The method comprises the following steps:

respectively carrying out quantization training on the pre-training model according to at least two random quantization accuracies and preset quantization accuracies to obtain a random quantization model corresponding to each random quantization accuracy and a fixed quantization model corresponding to the fixed quantization accuracy;

determining a first quantization model from each random quantization model according to a prediction error between each random quantization model and the fixed quantization model;

carrying out quantization training on an actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model, wherein the actual training model is obtained by training a pre-training model based on sample operation data;

and carrying out knowledge distillation training on the second quantization model according to the actual training model to obtain a quantization network model.

In one embodiment, according to the sorting result, determining the target quantization precision from the random quantization precision corresponding to each network layer in the first quantization model, and determining the layer to be quantized corresponding to the target quantization precision in the actual training model includes:

determining the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model according to the sequencing result;

Judging whether the highest quantization precision is the same as the network precision of the actual training model;

if the network layers are different, taking the highest quantization precision as a target quantization precision, and taking all network layers of the actual training model as layers to be quantized corresponding to the target quantization precision;

if the two network layers are the same, determining a locking layer corresponding to the highest quantization precision according to the network layer corresponding to the highest quantization precision in the first quantization model, taking the next random quantization precision with the precision sequence after the highest quantization precision as a target quantization precision, and taking other network layers except the locking layer corresponding to the highest quantization precision in the actual training model as layers to be quantized corresponding to the target quantization precision.

In one embodiment, according to the target quantization accuracy, performing quantization training on a layer to be quantized corresponding to an actual training model includes:

according to the target quantization precision, performing quantization treatment on a corresponding layer to be quantized in the actual training model to obtain a quantization model to be adjusted;

based on the sample operation data, training the quantization model to be adjusted.

In one embodiment, training the quantization model to be adjusted based on the sample operational data includes:

And carrying out knowledge distillation training on the quantization model to be adjusted based on the sample operation data and the actual training model.

In one embodiment, determining a first quantization model from each random quantization model based on a prediction error between each random quantization model and a fixed quantization model comprises:

determining a first dispersion of the predicted result output by each random quantization model based on the sample operation data, and determining a second dispersion of the predicted result output by the fixed quantization model based on the sample operation data;

determining a prediction error between each random quantization model and the fixed quantization model according to each first dispersion degree and each second dispersion degree;

and taking the random quantization model corresponding to the minimum prediction error as a first quantization model.

In a second aspect, the present application further provides a network model quantization apparatus. The device comprises:

the first training model is used for carrying out quantization training on the pre-training model according to at least two random quantization precision and preset quantization precision respectively to obtain a random quantization model corresponding to each random quantization precision and a fixed quantization model corresponding to the fixed quantization precision;

a determining model for determining a first quantization model from each random quantization model based on a prediction error between each random quantization model and a fixed quantization model;

The second training model is used for carrying out quantization training on the actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model, and the actual training model is obtained by training the pre-training model based on sample operation data;

and the third training model is used for carrying out knowledge distillation training on the second quantization model according to the actual training model to obtain a quantization network model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the network model quantization method according to any of the embodiments of the first aspect described above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a network model quantization method as in any of the embodiments of the first aspect described above.

In a fifth aspect, the present application also provides a computer program product. A computer program product comprising a computer program which when executed by a processor implements a network model quantization method as in any of the embodiments of the first aspect described above.

According to the technical scheme, the random quantization model and the fixed quantization model are determined, so that a data basis is provided for the subsequent determination of the first quantization model, the first quantization model can be determined from the random quantization model according to the fixed quantization model, the acquisition of the hybrid quantization friendly first quantization model is realized, and the accuracy of the subsequent determination of the quantization network model is ensured; the second quantization model is determined according to the first quantization model, so that preliminary quantization processing of the actual training model is realized, the quantization precision of each network layer in the second quantization model is the same as the quantization precision of each network layer in the first quantization model, and the quantization precision corresponding to each network layer in the subsequent obtained quantization network model is ensured to meet the actual requirement; by carrying out knowledge distillation training on the second quantization model, the accuracy and the practicability of the quantization network model are ensured, the efficiency of determining the quantization network model is improved, the operation difficulty of determining the quantization network model is reduced, the training times required by determining the quantization network model are reduced, and more time is prevented from being wasted in the training process.

Drawings

Fig. 1 is an application environment diagram of a network model quantization method provided in an embodiment of the present application;

Fig. 2 is a flowchart of a network model quantization method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating steps for determining a second quantization model according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps for determining a first quantization model according to an embodiment of the present application;

FIG. 5 is a flowchart of another method for quantifying a network model according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a first network model quantization apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of a second network model quantization apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a third network model quantization apparatus according to an embodiment of the present application;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. In the description of the present application, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

However, in the process of performing compression processing on the network model, multiple times of training are required to be performed on the network model so as to ensure the accuracy of the network model in outputting the prediction result, if the network model is not trained for multiple times, the accuracy of the network model in outputting the prediction result is reduced, and the network model cannot be normally used; therefore, the method for compressing the network model in the prior art is complicated, so that the efficiency of compressing the network model is low.

The network model quantization method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store the network model quantized acquisition data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a network model quantization method.

The application discloses a network model quantification method, a network model quantification device, computer equipment and a storage medium of the computer equipment. The computer equipment of the staff member determines a random quantization model and a fixed quantization model, and determines a first quantization model from the random quantization model according to the fixed quantization model; and carrying out quantization training on the actual training model through the first quantization model to obtain a second quantization model, and carrying out knowledge distillation training on the second quantization model to obtain a quantization network model.

In one embodiment, as shown in fig. 2, fig. 2 is a flowchart of a network model quantization method provided in an embodiment of the present application, and a network model quantization method performed by a computer device in fig. 1 may include the following steps:

step 201, performing quantization training on the pre-training model according to at least two random quantization accuracies and a preset quantization accuracy, so as to obtain a random quantization model corresponding to each random quantization accuracy and a fixed quantization model corresponding to a fixed quantization accuracy.

Wherein, the pre-training model refers to: pre-training a network structure according to actual use scenes in advance, and training the pre-training network structure according to sample operation data to obtain a model; further, the actual use scene can be changed according to the actual situation, and the pretrained network structures corresponding to different actual use scenes can be the same or different; still further, the sample operation data is associated with the actual usage scenario, which may be understood as different actual usage scenarios corresponding to different sample operation data, for example: when the actual use scene is the vehicle detection use scene, the sample operation data corresponding to the actual use scene is the data related to the vehicle detection;

As an implementation manner, when the actual usage scenario is a vehicle detection usage scenario, sample operation data corresponding to the actual usage scenario may be: toll station scene picture, parking area scene picture and road side scene picture to all contain at least one vehicle in every picture, all annotate the motorcycle type of position and the vehicle of vehicle in every picture, wherein, the motorcycle type mainly divide into: small cars, bus buses, and large cars.

It should be noted that the random quantization model refers to a quantization model in which the precision of the network layer of each layer is set randomly; wherein the network layer may include, but is not limited to: conv (convolution) layers, linear layers, and active layers, etc. In summary, when the random quantization model needs to be determined, the precision of each network layer can be determined randomly in advance, namely, the random quantization precision is determined randomly, then the pre-training model is subjected to quantization training according to the random quantization precision determined randomly, and the obtained model is the random quantization model corresponding to the random quantization precision.

Further, the fixed quantization model refers to a quantization model in which the network layer of each layer conforms to a preset precision range; specifically, when the fixed quantization model needs to be determined, the quantization precision of the fixed quantization model can be preset according to actual requirements, namely the preset quantization precision; and further, according to the preset quantization precision, carrying out quantization training on the pre-training model, so that the precision of each network layer in the trained pre-training model accords with the preset quantization precision, and the trained pre-training model is a fixed quantization model.

In one embodiment of the application, through a preset precision range, the computer equipment can operate the quantized network model obtained by subsequent quantization, so that the situation that the computer equipment cannot operate the quantized network model due to the fact that the precision of the quantized network model is too high is avoided; therefore, when the preset quantization precision needs to be determined, the setting can be performed according to the configuration requirement of the computer device actually running the quantization network model, for example, the maximum precision that the computer device can run can be used as the preset quantization precision of the fixed quantization model.

Step 202, determining a first quantization model from each random quantization model according to the prediction error between each random quantization model and the fixed quantization model.

It should be noted that, the first quantization model refers to: the random quantization models output the model with the largest similarity degree between the prediction result based on the sample operation data and the prediction result output by the fixed quantization model based on the sample operation data, so when the first quantization model needs to be determined, the prediction error between the random quantization models and the fixed quantization model can be determined first, and the random quantization model with the smallest prediction error with the fixed quantization model is determined and selected from the random quantization models according to the prediction error, and the random quantization model is the first quantization model.

In one embodiment of the present application, test sample data may be preset, which may be selected from sample run data; inputting the test sample data into each random quantization model, and obtaining a prediction result output by each random quantization model according to the test sample data; inputting the test sample data into a fixed quantization model to obtain a prediction result output by the fixed quantization model according to the test sample data; and determining the prediction error between each random quantization model and the fixed quantization model according to the prediction result output by each random quantization model according to the test sample data and the prediction result output by the fixed quantization model according to the test sample data, and finally determining the random quantization model with the minimum prediction error with the fixed quantization model from each random quantization model according to the prediction error, wherein the random quantization model is the first quantization model.

Further, the prediction error can be determined according to the difference degree between the prediction result output by each random quantization model according to the test sample data and the prediction result output by the fixed quantization model according to the test sample data; for example, if the degree of difference between the prediction result output by a random quantization model according to the test sample data and the prediction result output by a fixed quantization model according to the test sample data is smaller, the prediction error between the random quantization model and the fixed quantization model is smaller; if the degree of difference between the predicted result output by a random quantization model according to the test sample data and the predicted result output by a fixed quantization model according to the test sample data is larger, the prediction error between the random quantization model and the fixed quantization model is larger.

In one embodiment of the present application, a prediction result output by each random quantization model based on the sample operation data may be determined according to the sample operation data; according to the sample operation data, determining a prediction result output by the fixed quantization model based on the sample operation data; determining the dispersion degree of the prediction results of each random quantization model and the dispersion degree of the prediction results of the fixed quantization model; and determining the dispersion most similar to the dispersion of the predicted result of the fixed quantization model from the dispersions of the predicted result of the random quantization models according to the dispersion of the predicted result of each random quantization model and the dispersion of the predicted result of the fixed quantization model, wherein the random quantization model corresponding to the dispersion is the first quantization model.

And 203, performing quantization training on the actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model.

The actual training model is obtained by training the pre-training model based on sample operation data; the sample operation data can be set according to actual requirements.

The second quantization model is obtained by performing quantization training according to the random quantization precision corresponding to the first quantization model; thus, each network layer of the second quantization model is the same precision as each network layer of the first quantization model.

In an embodiment of the present application, when the second quantization model needs to be determined, the corresponding network layer of the actual training model may be quantitatively trained according to the accuracy corresponding to each network layer in the first quantization model, so that the accuracy of each network layer in the actual training model is the same as the accuracy of the corresponding network layer in the first quantization model, and the actual training model is the second quantization model.

And 204, performing knowledge distillation training on the second quantization model according to the actual training model to obtain a quantization network model.

When knowledge distillation training is required to be performed on the second quantization model, the actual training model is used as a teacher model, the second quantization model is used as a student model, and knowledge distillation training is further performed on the student model according to the teacher model, so that experience knowledge of the teacher model is migrated to the student model, namely experience knowledge obtained by training the actual training model is migrated to the second quantization model, and accuracy of a training result output by the second quantization model is guaranteed.

According to the network model quantization method, the random quantization model and the fixed quantization model are determined, so that a data basis is provided for the subsequent determination of the first quantization model, the first quantization model can be determined from the random quantization model according to the fixed quantization model, the acquisition of the hybrid quantization friendly first quantization model is realized, and the accuracy of the subsequent determination of the quantized network model is ensured; the second quantization model is determined according to the first quantization model, so that preliminary quantization processing of the actual training model is realized, the quantization precision of each network layer in the second quantization model is the same as the quantization precision of each network layer in the first quantization model, and the quantization precision corresponding to each network layer in the subsequent obtained quantization network model is ensured to meet the actual requirement; by carrying out knowledge distillation training on the second quantization model, the accuracy and the practicability of the quantization network model are ensured, the efficiency of determining the quantization network model is improved, the operation difficulty of determining the quantization network model is reduced, the training times required by determining the quantization network model are reduced, and more time is prevented from being wasted in the training process.

It should be noted that, the number of network layers included in the first quantization model and the actual training model is the same and corresponds to each other, so that the second quantization model can be determined according to the ordering result of the random quantization precision corresponding to each network layer in the first quantization model. Optionally, as shown in fig. 3, fig. 3 is a flowchart illustrating a step of determining a second quantization model according to an embodiment of the present application. Specifically, determining the second quantization model may include the steps of:

step 301, sorting the random quantization accuracies corresponding to the network layers in the first quantization model, determining a target quantization accuracy from the random quantization accuracies corresponding to the network layers in the first quantization model according to the sorting result, and determining a layer to be quantized corresponding to the target quantization accuracy in the actual training model.

The target quantization precision is the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization model.

It should be noted that, according to the sorting result, determining the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model; judging whether the highest quantization precision is the same as the network precision of the actual training model; if the network layers are different, taking the highest quantization precision as a target quantization precision, and taking all network layers of the actual training model as layers to be quantized corresponding to the target quantization precision; if the two network layers are the same, determining a locking layer corresponding to the highest quantization precision according to the network layer corresponding to the highest quantization precision in the first quantization model, taking the next random quantization precision with the precision sequence after the highest quantization precision as a target quantization precision, and taking other network layers except the locking layer corresponding to the highest quantization precision in the actual training model as layers to be quantized corresponding to the target quantization precision.

In an embodiment of the present application, if the random quantization precision corresponding to each network layer in the first quantization model is: the first layer precision is int8, and the second layer precision is int4; the actual training model is a float 32-bit network; the ordering result of the random quantization precision corresponding to each network layer in the first quantization model is determined as follows: int8, int4; wherein, int8 is the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model; judging whether the network precision of the int8 is the same as that of the actual training model, and taking the int8 as the target quantization precision because of the network of the actual training model float32 bits, wherein the network precision of the int8 is different from that of the actual training model; and taking all network layers of the actual training model as layers to be quantized corresponding to the target quantization precision.

In an embodiment of the present application, if the random quantization precision corresponding to each network layer in the first quantization model is: the first layer precision is int8, and the second layer precision is int4; the network precision of the actual training model is the first layer precision of int8, and the second layer precision of int2; the ordering result of the random quantization precision corresponding to each network layer in the first quantization model is determined as follows: int8, int4; wherein, int8 is the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model; judging whether the network precision of the int8 is the same as that of the actual training model, wherein the first layer precision of the actual training model is also int8, so that the network precision of the int8 is judged to be the same as that of the actual training model, and the highest quantization precision int8 is correspondingly the first layer in the first quantization model, so that the first layer of the actual training model is used as a locking layer; and taking the next random quantization precision int4 with the precision sequence after the highest quantization precision as a target quantization precision, and taking a second layer except a locking layer corresponding to the highest quantization precision in the actual training model as a layer to be quantized corresponding to the target quantization precision.

And 302, carrying out quantization training on the corresponding layer to be quantized in the actual training model according to the target quantization precision, and determining a locking layer in the layer to be quantized.

The locking layers and the network layers corresponding to the target quantization accuracy in the first quantization model have a one-to-one correspondence; and the quantization precision of the locking layer is not changed in the process of quantization training.

In an embodiment of the present application, if the random quantization precision corresponding to each network layer in the first quantization model is: the first layer precision is int8, and the second layer precision is int4; the actual training model is a float 32-bit network, and all network layers of the actual training model are used as layers to be quantized corresponding to the target quantization precision; therefore, carrying out quantization training on the corresponding layer to be quantized in the actual training model through the target quantization precision int8, wherein the target quantization precision int8 corresponds to the first layer in the first quantization model; the first layer of the layers to be quantized of the actual training model is thus taken as the lock layer.

It should be noted that, when performing quantization training on a layer to be quantized, the method specifically includes the following steps: according to the target quantization precision, performing quantization treatment on a corresponding layer to be quantized in the actual training model to obtain a quantization model to be adjusted; based on the sample operation data, training the quantization model to be adjusted.

As an implementation; if the random quantization precision corresponding to each network layer in the first quantization model is: the first layer precision is int8, and the second layer precision is int4; the actual training model is a float 32-bit network, the target quantization precision is int8, and all network layers of the actual training model are taken as layers to be quantized corresponding to the target quantization precision; therefore, when the quantization training is performed on the layer to be quantized corresponding to the actual training model, the quantization processing is performed on the layer to be quantized in the actual training model according to int8, so that the quantization precision is obtained as follows: the first layer precision is int8, the second layer precision is int8 of the quantization model to be adjusted, and the quantization model to be adjusted is trained based on sample operation data.

Further, when training the quantization model to be adjusted through the sample operation data, knowledge distillation training can be performed on the quantization model to be adjusted based on the sample operation data and the actual training model. Specifically, the quantization model to be adjusted can be trained based on the sample operation data, so that the parameter fine adjustment of the quantization model to be adjusted is realized, and the parameter of the quantization model to be adjusted is ensured to adapt to an application scene corresponding to the sample operation data; and then taking the quantization model to be adjusted as a student model, taking the actual training model as a teacher model, and carrying out knowledge distillation training on the quantization model to be adjusted.

Step 303, determining whether the target quantization precision is the lowest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model.

If not, taking the next random quantization precision after the target quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model as a new target quantization precision, taking other layers to be quantized except the locking layer in the layers to be quantized as new layers to be quantized, and returning to execute step 302 based on the new target quantization precision and the new network layers to be quantized; if yes, taking the actual training model after quantization training as a second quantization model.

In an embodiment of the present application, if the random quantization precision corresponding to each network layer in the first quantization model is: the first layer precision is int8, and the second layer precision is int4; the actual training model is a float 32-bit network, and all network layers of the actual training model are used as layers to be quantized corresponding to the target quantization precision; therefore, when the quantization training is performed on the layer to be quantized corresponding to the actual training model through the target quantization precision int8, the quantization precision of the actual training model after training is: the first layer precision is int8, and the second layer precision is int8; wherein, the first layer in the first quantization model corresponding to the target quantization precision int8; the first layer of the layers to be quantized of the actual training model is thus taken as the lock layer. Since the target quantization precision is not the lowest quantization precision of the random quantization precision corresponding to each network layer in the first quantization network model, int4 is taken as a new target quantization precision, a second layer except the first layer in the layers to be quantized is taken as a new layer to be quantized, based on the new target quantization precision and the new network layer to be quantized, the operation of performing quantization training on the corresponding layer to be quantized in the actual training model according to the target quantization precision is performed (step 302), and the quantization precision of the actual training model after training is: the first layer precision is int8, and the second layer precision is int4; and taking the actual training model after quantization training as a second quantization model because the target quantization precision is the lowest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model.

According to the network model quantization method, through determining the sequencing result of the random quantization precision corresponding to each network layer in the first quantization model, the subsequent determination of the target quantization precision according to the sequencing result is ensured, a judgment basis is provided for the subsequent judgment of whether the target quantization precision is the lowest quantization precision in the random quantization precision, and the operation difficulty of quantization training is reduced; by determining the layers to be quantized and the locking layers, the quantization accuracy corresponding to each network layer in the second quantization model can be identical to the quantization accuracy corresponding to each network layer in the first quantization model in quantization training according to the target quantization accuracy, and the quantization accuracy corresponding to each network layer in the subsequent obtained quantization network model is ensured to meet the actual requirement.

It should be noted that the first quantization model may be determined according to the first dispersion degree and the second dispersion degree. Optionally, as shown in fig. 4, fig. 4 is a flowchart illustrating a step of determining a first quantization model according to an embodiment of the present application. Specifically, determining the first quantization model may include the steps of:

step 401, determining a first dispersion of the prediction results output by each random quantization model based on the sample operation data, and determining a second dispersion of the prediction results output by the fixed quantization model based on the sample operation data.

When the first dispersion is required to be determined, the sample operation data can be used as input data to be input into each random quantization model, so that a plurality of first output results corresponding to each random quantization model are obtained; and performing dispersion calculation on the plurality of first output results, and determining the first dispersion corresponding to each random quantization model.

Further, when the second dispersion is required to be determined, the sample operation data can be input into the fixed quantization model as input data, so as to obtain a second output result corresponding to the sample operation data; and calculating the second output result, and determining a second dispersion corresponding to the second output result.

Step 402, determining a prediction error between each random quantization model and the fixed quantization model according to each first dispersion and each second dispersion.

It should be noted that, the prediction error is used to represent the difference degree between the first dispersion and the second dispersion; further, the larger the prediction error is, the larger the difference degree between the first dispersion and the second dispersion is, which can be understood that the similarity between the random quantization model corresponding to the first dispersion and the fixed quantization model is lower; the smaller the prediction error, the smaller the difference degree between the first dispersion and the second dispersion, which can be understood as the higher the similarity between the random quantization model corresponding to the first dispersion and the fixed quantization model.

In step 403, the random quantization model corresponding to the minimum prediction error is used as the first quantization model.

In one embodiment of the present application, by determining each first and second dispersion and a prediction error between each first and second dispersion, determining a minimum prediction error among the prediction errors between each first and second dispersion, determining a first dispersion corresponding to the prediction error, and further determining a random quantization model according to the first dispersion, the random quantization model is used as a first quantization model.

According to the network model quantization method, the first quantization model with the highest similarity with the fixed quantization model is determined from the random quantization models by determining the first dispersion and the second dispersion, so that the smooth proceeding of a subsequent operation flow is ensured, the acquisition of the first quantization model with mixed quantization friendliness is realized, and the accuracy of the subsequent determined quantization network model is ensured; further, the first quantization model is determined through the first dispersion and the second dispersion, so that the efficiency of determining the quantization network model is improved, the operation difficulty of determining the quantization network model is reduced, the training times required for determining the quantization network model are reduced, and more time is prevented from being wasted in the training process.

In an embodiment of the present application, as shown in fig. 5, fig. 5 is a flowchart of another network model quantization method provided in the embodiment of the present application, and when quantization processing needs to be performed on a network model, the method specifically may include the following steps:

and step 501, respectively carrying out quantization training on the pre-training model according to at least two random quantization accuracies and preset quantization accuracies to obtain random quantization models corresponding to the random quantization accuracies and fixed quantization models corresponding to the fixed quantization accuracies.

Step 502, determining a first dispersion of the predicted results output by each random quantization model based on the sample operational data, and determining a second dispersion of the predicted results output by the fixed quantization model based on the sample operational data.

In step 503, a prediction error between each random quantization model and the fixed quantization model is determined according to each first dispersion and each second dispersion.

In step 504, the random quantization model corresponding to the minimum prediction error is used as the first quantization model.

And 505, ordering the random quantization precision corresponding to each network layer in the first quantization model.

Step 506, determining the highest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model according to the sorting result.

Step 507, judging whether the highest quantization precision is the same as the network precision of the actual training model; if not, go to step 508; if so, step 509 is performed.

And step 508, taking the highest quantization precision as a target quantization precision, and taking all network layers of the actual training model as layers to be quantized corresponding to the target quantization precision.

Step 509, determining a locking layer corresponding to the highest quantization precision according to the network layer corresponding to the highest quantization precision in the first quantization model, taking the next random quantization precision with the precision ordered after the highest quantization precision as the target quantization precision, and taking other network layers except the locking layer corresponding to the highest quantization precision in the actual training model as the layers to be quantized corresponding to the target quantization precision.

And 510, carrying out quantization processing on the corresponding layer to be quantized in the actual training model according to the target quantization precision to obtain a quantization model to be adjusted.

Step 511, performing knowledge distillation training on the quantization model to be adjusted based on the sample operation data and the actual training model.

Step 512, determining a locking layer in the layers to be quantized; the locking layers and the network layers corresponding to the target quantization accuracy in the first quantization model have a one-to-one correspondence.

Step 513, judging whether the target quantization precision is the lowest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model; if not, go to step 514; if yes, go to step 515.

Step 514, taking the next random quantization precision, which is the next random quantization precision after the target quantization precision, in the random quantization precision corresponding to each network layer in the first quantization network model as a new target quantization precision, taking the other layers to be quantized except the locking layer in the layers to be quantized as new layers to be quantized, and returning to execute step 510 based on the new target quantization precision and the new network layers to be quantized.

And step 515, if yes, taking the actual training model after quantization training as a second quantization model.

And step 516, performing knowledge distillation training on the second quantization model according to the actual training model to obtain a quantization network model.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a network model quantization device for realizing the above-mentioned network model quantization method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the network model quantization device or devices provided below may refer to the limitation of the network model quantization method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 6, fig. 6 is a block diagram of a first network model quantization apparatus according to an embodiment of the present application, and provides a network model quantization apparatus, including: a first training model 10, a determination model 20, a second training model 30, and a third training model 40, wherein:

the first training model 10 is configured to perform quantization training on the pre-training model according to at least two random quantization resolutions and a preset quantization precision, so as to obtain a random quantization model corresponding to each random quantization precision and a fixed quantization model corresponding to a fixed quantization precision.

A determination model 20 is used for determining a first quantization model from each random quantization model based on a prediction error between each random quantization model and the fixed quantization model.

And the second training model 30 is configured to perform quantization training on the actual training model according to the random quantization precision corresponding to the first quantization model, so as to obtain a second quantization model, where the actual training model is obtained by training the pre-training model based on the sample operation data.

And a third training model 40, configured to perform knowledge distillation training on the second quantization model according to the actual training model, to obtain a quantization network model.

According to the network model quantization device, the random quantization model and the fixed quantization model are determined, so that a data basis is provided for the subsequent determination of the first quantization model, the first quantization model can be determined from the random quantization model according to the fixed quantization model, the acquisition of the hybrid quantization friendly first quantization model is realized, and the accuracy of the subsequent determination of the quantized network model is ensured; the second quantization model is determined according to the first quantization model, so that preliminary quantization processing of the actual training model is realized, the quantization precision of each network layer in the second quantization model is the same as the quantization precision of each network layer in the first quantization model, and the quantization precision corresponding to each network layer in the subsequent obtained quantization network model is ensured to meet the actual requirement; by carrying out knowledge distillation training on the second quantization model, the accuracy and the practicability of the quantization network model are ensured, the efficiency of determining the quantization network model is improved, the operation difficulty of determining the quantization network model is reduced, the training times required by determining the quantization network model are reduced, and more time is prevented from being wasted in the training process.

In one embodiment, as shown in fig. 7, fig. 7 is a block diagram of a second network model quantization apparatus according to an embodiment of the present application, and provides a network model quantization apparatus, where a second training model 30 includes: a first determination unit 31, a second determination unit 32, and a judgment unit 33, wherein:

The first determining unit 31 is configured to sort random quantization resolutions corresponding to network layers in the first quantization model, determine a target quantization accuracy from the random quantization resolutions corresponding to network layers in the first quantization model according to the sorting result, and determine a layer to be quantized corresponding to the target quantization accuracy in the actual training model.

A second determining unit 32, configured to perform quantization training on a corresponding layer to be quantized in the actual training model according to the target quantization precision, and determine a locking layer in the layer to be quantized; the locking layers and the network layers corresponding to the target quantization accuracy in the first quantization model have a one-to-one correspondence.

It should be noted that, according to the target quantization precision, the quantization processing is performed on the corresponding layer to be quantized in the actual training model, so as to obtain the quantization model to be adjusted; based on the sample operation data, training the quantization model to be adjusted.

Further, knowledge distillation training is performed on the quantization model to be adjusted based on the sample operation data and the actual training model.

A judging unit 33, configured to judge whether the target quantization precision is the lowest quantization precision among random quantization precision corresponding to each network layer in the first quantization network model. If not, taking the next random quantization precision which is positioned behind the target quantization precision in the precision ordering in the random quantization precision corresponding to each network layer in the first quantization network model as a new target quantization precision, taking other layers to be quantized except the locking layer in the layers to be quantized as new layers to be quantized, and returning to execute the operation of carrying out quantization training on the corresponding layers to be quantized in the actual training model according to the target quantization precision based on the new target quantization precision and the new network layers to be quantized. If yes, taking the actual training model after quantization training as a second quantization model.

According to the network model quantization device, through determining the sequencing result of the random quantization precision corresponding to each network layer in the first quantization model, the subsequent determination of the target quantization precision according to the sequencing result is ensured, a judgment basis is provided for the subsequent judgment of whether the target quantization precision is the lowest quantization precision in the random quantization precision, and the operation difficulty of quantization training is reduced; by determining the layers to be quantized and the locking layers, the quantization accuracy corresponding to each network layer in the second quantization model can be identical to the quantization accuracy corresponding to each network layer in the first quantization model in quantization training according to the target quantization accuracy, and the quantization accuracy corresponding to each network layer in the subsequent obtained quantization network model is ensured to meet the actual requirement.

In one embodiment, as shown in fig. 8, fig. 8 is a block diagram of a third network model quantization apparatus provided in the embodiment of the present application, and a network model quantization apparatus is provided, in which determining a model 20 includes: a third determination unit 21, a fourth determination unit 22, and a fifth determination unit 23, wherein:

a third determining unit 21 for determining a first dispersion of the prediction results output by the random quantization models based on the sample operation data, and determining a second dispersion of the prediction results output by the fixed quantization models based on the sample operation data.

A fourth determining unit 22 for determining a prediction error between each random quantization model and the fixed quantization model based on each first dispersion and each second dispersion.

A fifth determining unit 23, configured to take, as the first quantization model, a random quantization model corresponding to the minimum prediction error.

According to the network model quantization device, the first quantization model with the highest similarity with the fixed quantization model is determined from the random quantization models by determining the first dispersion and the second dispersion, so that the smooth proceeding of a subsequent operation flow is ensured, the acquisition of the first quantization model with mixed quantization friendliness is realized, and the accuracy of the subsequent determined quantization network model is ensured; further, the first quantization model is determined through the first dispersion and the second dispersion, so that the efficiency of determining the quantization network model is improved, the operation difficulty of determining the quantization network model is reduced, the training times required for determining the quantization network model are reduced, and more time is prevented from being wasted in the training process.

The respective modules in the above network model quantization apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a network model quantization method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

ordering the random quantization precision corresponding to each network layer in the first quantization model, determining target quantization precision from the random quantization precision corresponding to each network layer in the first quantization model according to the ordering result, and determining a layer to be quantized corresponding to the target quantization precision in the actual training model;

according to the target quantization precision, performing quantization training on a corresponding layer to be quantized in the actual training model, and determining a locking layer in the layer to be quantized; the locking layers and the network layers corresponding to the target quantization accuracy in the first quantization model have a one-to-one correspondence;

judging whether the target quantization precision is the lowest quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model;

if not, taking the next random quantization precision which is positioned behind the target quantization precision in the random quantization precision corresponding to each network layer in the first quantization network model as a new target quantization precision, taking other layers to be quantized except the locking layer in the layers to be quantized as new layers to be quantized, and returning to execute the operation of carrying out quantization training on the corresponding layers to be quantized in the actual training model according to the target quantization precision based on the new target quantization precision and the new network layers to be quantized;

If yes, taking the actual training model after quantization training as a second quantization model.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for quantifying a network model, the method comprising:

respectively carrying out quantization training on the pre-training model according to at least two random quantization precision and preset quantization precision to obtain random quantization models corresponding to the random quantization precision and fixed quantization models corresponding to the fixed quantization precision;

determining a first quantization model from each random quantization model according to a prediction error between the random quantization model and the fixed quantization model;

Performing quantization training on an actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model, wherein the actual training model is obtained by training the pre-training model based on sample operation data;

2. The method of claim 1, wherein the first quantization model and the actual training model comprise the same number of network layers and are in one-to-one correspondence; performing quantization training on the actual training model according to the random quantization precision corresponding to the first quantization model to obtain a second quantization model, including:

according to the target quantization precision, carrying out quantization training on the corresponding layer to be quantized in the actual training model, and determining a locking layer in the layer to be quantized; the locking layers and the network layers corresponding to the target quantization precision in the first quantization model have a one-to-one correspondence;

and if yes, taking the actual training model after quantization training as the second quantization model.

3. The method according to claim 2, wherein the determining, according to the sorting result, a target quantization precision from the random quantization precision corresponding to each network layer in the first quantization model, and determining a layer to be quantized corresponding to the target quantization precision in the actual training model, includes:

if the highest quantization precision is different, the highest quantization precision is used as the target quantization precision, and all network layers of the actual training model are used as the layers to be quantized corresponding to the target quantization precision;

if the same, determining a locking layer corresponding to the highest quantization precision according to the network layer corresponding to the highest quantization precision in a first quantization model, taking the next random quantization precision with the precision sequence after the highest quantization precision as the target quantization precision, and taking other network layers except the locking layer corresponding to the highest quantization precision in the actual training model as the layers to be quantized corresponding to the target quantization precision.

4. The method according to claim 2, wherein said performing quantization training on the corresponding layer to be quantized in the actual training model according to the target quantization accuracy comprises:

according to the target quantization precision, carrying out quantization treatment on the corresponding layer to be quantized in the actual training model to obtain a quantization model to be adjusted;

and training the quantization model to be adjusted based on the sample operation data.

5. The method of claim 4, wherein the training the quantization model to be adjusted based on the sample operational data comprises:

6. The method of claim 1, wherein said determining a first quantization model from each random quantization model based on a prediction error between said random quantization model and said fixed quantization model comprises:

determining a first dispersion of the predicted results output by the random quantization models based on the sample operation data, and determining a second dispersion of the predicted results output by the fixed quantization model based on the sample operation data;

determining a prediction error between each random quantization model and the fixed quantization model according to each first dispersion and each second dispersion;

7. A network model quantization apparatus, the apparatus comprising:

the first training model is used for carrying out quantization training on the pre-training model according to at least two random quantization precision and preset quantization precision respectively to obtain random quantization models corresponding to the random quantization precision and fixed quantization models corresponding to the fixed quantization precision;

A determining model for determining a first quantization model from each random quantization model based on a prediction error between the random quantization model and the fixed quantization model;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.