WO2023050707A1

WO2023050707A1 - Network model quantization method and apparatus, and computer device and storage medium

Info

Publication number: WO2023050707A1
Application number: PCT/CN2022/078256
Authority: WO
Inventors: 梁玲燕; 董刚; 赵雅倩; 温东超
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2021-09-28
Filing date: 2022-02-28
Publication date: 2023-04-06
Also published as: CN113610232B; CN113610232A

Abstract

Disclosed in the present application are a network model quantization method and apparatus, and a computer device and a storage medium, which are applicable to the technical field of artificial intelligence. The network model quantization method comprises: acquiring a network model to be processed, according to quantization requirements, separately performing quantization processing on a weight parameter and an activation output of the network model to be processed, so as to obtain an initial weight parameter and an initial quantization parameter of the activation output, and constructing an initial network model; acquiring a first calibration network model, and adjusting the initial weight parameter of the initial network model on the basis of the first calibration network model, so as to obtain a first pre-processed model; and acquiring a second calibration network model, and adjusting an initial quantization parameter of an activation output of the first pre-processed model on the basis of the second calibration network model, so as to obtain a target network model. By using the method, the problem of the precision of a large-sized deep neural network model being reduced caused by reducing the deep neural network model by means of model compression such as quantization and cropping can be solved.

Description

Network model quantification method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 28, 2021, with the application number 202111139349.X, and the title of the invention is "Network Model Quantization Method, Device, Computer Equipment, and Storage Medium", the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the field of artificial intelligence, in particular to a network model quantification method, device, computer equipment and storage medium.

Background technique

With the continuous development of artificial intelligence technology, the application of artificial intelligence technology is becoming more and more extensive. In the field of artificial intelligence technology, deep learning is one of the more typical technologies. The essence of deep learning is an artificial neural network, and a neural network with many layers is called a deep neural network. At present, although the capabilities of deep neural network models in image classification and detection are close to or surpass those of humans, in actual deployment, there are still problems such as large models and high computational complexity, which require high hardware costs. In practical applications, in order to reduce hardware costs, neural network models are usually deployed on some terminal devices or edge devices. These devices generally have low computing power, and memory and power consumption are also limited. Therefore, how to reduce the large deep neural network model and realize the real deployment of the deep neural network model on the terminal has become an urgent problem to be solved while ensuring the accuracy of the model.

In the prior art, model compression methods such as quantization and cropping are usually used to reduce the size of the deep neural network model, thereby reducing the size of the large deep neural network model.

However, in the above-mentioned prior art, in the process of reducing the large-scale deep neural network model by means of model compression such as quantization and cropping, the accuracy of the deep neural network model is seriously reduced, so that the precision of the reduced deep neural network model is low. Thus affecting the application of the reduced deep neural network model.

Contents of the invention

In view of this, the embodiment of the present application provides a network model quantization method, device, computer equipment, and storage medium to solve the problem of shrinking large-scale deep neural network models through model compression such as quantization and cropping. The accuracy of deep neural network models is relatively low. low problem.

According to the first aspect, the embodiment of the present application provides a network model quantification method, the method includes: obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the network model to be processed is Quantize the weight parameter and activation output respectively to obtain the initial weight parameter and the initial quantization parameter of the activation output, and construct the initial network model based on the initial weight parameter and the initial quantization parameter of the activation output; obtain the first calibration network model, the first calibration network The accuracy of the model is higher than that of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model; the second calibration network model is obtained, and the accuracy of the second calibration network model is high Based on the accuracy of the initial network model, an initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain a target network model.

In this embodiment, first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameters and activation output The initial quantization parameters of , based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model. Since the weight of the network model to be processed and the initial quantization parameter of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameter and the initial quantization parameter of the activation output is much smaller than the network model to be processed, thus ensuring the initial Network models can run on some end devices and edge devices. In addition, because the accuracy of the initial network model obtained after quantization processing is low, the initial weight parameters of the initial network model can be adjusted based on the first calibration network model whose model accuracy is higher than that of the initial network model to obtain the first preprocessing model, so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model. In addition, based on the second calibration network model whose model accuracy is higher than that of the first preprocessing model, the initial quantization parameters of the activation output of the first preprocessing model can be adjusted to obtain the target network model. As a result, not only the size of the target network model is smaller, but also the weight parameters and activation output range of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural networks through quantization, cropping and other model compression methods. model, which makes the accuracy of the reduced deep neural network model lower.

In combination with the first aspect, in the first embodiment of the first aspect, the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model, including: a learning method based on knowledge distillation, according to the first A calibration network model adjusts initial weight parameters of the initial network model to obtain a first preprocessing model.

In this embodiment, since the accuracy of the first calibrated network model is higher than that of the initial network model, the learning method based on knowledge distillation uses the first calibrated network model as the large teacher network model for the small quantized initial network model Conduct guided learning to obtain better model parameters, adjust the initial weight parameters of the initial network model according to the first calibration network model, and obtain the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.

In combination with the first embodiment of the first aspect, in the second embodiment of the first aspect, the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model, including: Obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the first The second result: based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.

In this embodiment, the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are output respectively, using the first result and the second result and The relationship between the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, so as to ensure that the first result obtained after weight parameter adjustment The accuracy of the preprocessed model has been improved.

In combination with the second embodiment of the first aspect, in the third embodiment of the first aspect, based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model, including: based on Generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and the second loss function to generate the first target loss function, and based on the first target loss The function adjusts the initial weight parameters of the initial network model to obtain the first preprocessing model.

In this embodiment, the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, based on the first result output by the initial network model and the second result output by the first calibration network model Generate the second loss function. Wherein, the first loss function can be used to represent the gap between the first result and the hard label, and the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result. The initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model, thereby improving the accuracy of the first preprocessing model.

In combination with the first aspect, in the fourth embodiment of the first aspect, the initial quantization parameters of the activation output of the first preprocessing model are adjusted based on the second calibration network model to obtain the target network model, including: a learning method based on knowledge distillation , adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; adjusting the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.

In this implementation, since the accuracy of the second calibration network model is higher than that of the first preprocessing model, the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters. Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold. Further, according to the adjusted activation quantization threshold, the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.

With reference to the fourth embodiment of the first aspect, in the fifth embodiment of the first aspect, the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model, includes: acquiring a second training image set; input the second training image set to the first preprocessing model, and output the third result; input the second training image set to the second calibration network model, and output the fourth result; based on the third result and the fourth result, adjust Activation quantization threshold for the first preprocessing model.

In this embodiment, the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing The activation quantization threshold of the processing model is adjusted, thereby ensuring the accuracy of the adjusted activation quantization threshold and further ensuring the accuracy of the obtained target network model.

With reference to the fifth embodiment of the first aspect, in the sixth embodiment of the first aspect, adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result includes: based on the third result and the fourth result, A second objective loss function is generated; based on the second objective loss function, an activation quantization threshold of the first preprocessing model is adjusted.

In this embodiment, based on the third result and the fourth result, a second objective loss function is generated, and a smaller value of the second objective loss function proves that the gap between the third result and the fourth result is smaller. Therefore, adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.

According to the second aspect, an embodiment of the present application provides a network model quantification device, which includes:

The quantization processing module is used to obtain the network model to be processed. The network model to be processed is a pre-trained full-precision network model. According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameter and activation. The initial quantization parameter of the output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model;

The first adjustment module is used to obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first prediction processing model;

The second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model , to get the target network model.

With reference to the first aspect, in the first embodiment of the first aspect, the above-mentioned first adjustment module is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preset Handle the model.

With reference to the first embodiment of the first aspect, in the second embodiment of the first aspect, the above-mentioned first adjustment module is specifically used to obtain the first training image set, the first training image set has a hard label; the first training image set input to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight of the initial network model parameters to obtain the first preprocessing model.

With reference to the second embodiment of the first aspect, in the third embodiment of the first aspect, the above-mentioned first adjustment module is specifically configured to generate a first loss function based on the first result and the hard label; generate a loss function based on the first result and the second result The second loss function: using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.

With reference to the first aspect, in the fourth implementation manner of the first aspect, the above-mentioned second adjustment module includes:

The first adjustment unit is used for the learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;

The second adjustment unit is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.

With reference to the fourth embodiment of the first aspect, in the fifth implementation manner of the first aspect, the above-mentioned first adjustment unit is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, outputting the third result; inputting the second training image set into the second calibration network model, outputting the fourth result; adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.

With reference to the fifth embodiment of the first aspect, in the sixth implementation manner of the first aspect, the above-mentioned first adjustment unit is specifically configured to: generate a second target loss function based on the third result and the fourth result; The objective loss function, which adjusts the activation quantization threshold of the first preprocessing model.

According to a third aspect, an embodiment of the present application provides an electronic device/mobile terminal/server, including: a memory and a processor, the memory and the processor are connected to each other in communication, computer instructions are stored in the memory, and the processor executes the Instructions, so as to execute the network model quantification method in the first aspect or any implementation manner of the first aspect.

According to the fourth aspect, the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the first aspect or any one of the implementations of the first aspect. Network Model Quantization Methods.

According to a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the computer executes The network model quantification method in the first aspect or any implementation manner of the first aspect.

Description of drawings

The features and advantages of the present application will be more clearly understood by referring to the accompanying drawings, which are schematic and should not be construed as limiting the application in any way. In the accompanying drawings:

Fig. 1 shows a flow chart of the steps of the network model quantification method in one embodiment;

Fig. 2a shows a schematic diagram of unsaturated mapping in PTQ model quantization in the network model quantization method in one embodiment;

Figure 2b shows a schematic diagram of saturation mapping in PTQ model quantization in the network model quantization method in an embodiment;

Fig. 3 shows the flow chart of the steps of the network model quantification method in another embodiment;

Fig. 4 shows a schematic diagram of the process of adjusting initial network model weight parameters in the network model quantification method in one embodiment;

Fig. 5 shows the flow chart of the steps of the network model quantification method in another embodiment;

Fig. 6 shows the flow chart of the steps of the network model quantification method in another embodiment;

Fig. 7 shows the flow chart of the steps of the network model quantification method in another embodiment;

Fig. 8 shows a schematic diagram of the process of adjusting the activation output threshold of the first preprocessing model in the network model quantization method in another embodiment;

Fig. 9 shows a flow chart of the steps of the network model quantification method in another embodiment;

Fig. 10 shows a flow chart of the steps of the network model quantification method in another embodiment;

Fig. 11 shows a schematic flowchart of a network model quantification method in another embodiment;

Fig. 12 shows a structural block diagram of a network model quantization device in an embodiment;

Fig. 13 shows a structural block diagram of a network model quantization device in an embodiment;

Fig. 14 shows an internal structural diagram when the computer device of an embodiment is a server;

Fig. 15 shows an internal structure diagram of an embodiment when the computer device is a terminal.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of this application.

It should be noted that the network model quantification method provided by the embodiment of the present application can be executed by a network model quantification device, and the network model quantification device can be implemented as a computer device through software, hardware, or a combination of software and hardware. Part or all, wherein, the computer device may be a server or a terminal, wherein the server in the embodiment of the present application may be a single server, or may be a server cluster composed of multiple servers, and the terminal in the embodiment of the present application may be Smartphones, personal computers, tablet computers, wearable devices, and other intelligent hardware devices such as intelligent robots. In the following method embodiments, the execution subject is a computer device as an example for illustration.

In one embodiment of the present application, as shown in Figure 1, a network model quantification method is provided, and the method is applied to computer equipment as an example for illustration, including the following steps:

Step 101, obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.

Specifically, the computer device can use the first target image training set to train the neural network model, and obtain the network model to be processed. Among them, the network model to be processed is a pre-trained full-precision network model. The network model to be processed can be used to process tasks such as image recognition, image detection, and image classification. The embodiment of the present application does not specifically limit the application scenarios of the network model to be processed.

Optionally, the computer device may also receive a network model to be processed sent by other devices or a network model to be processed input by a user. The embodiment of the present application does not specifically limit the manner in which the computer device acquires the network model to be processed.

In the embodiment of the present application, the computer device performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, and the initial quantization parameter based on the initial weight parameter and the activation output , to build an initial network model. Wherein, the quantitative requirement may be input by the user to the computer device based on the input component of the computer device. Quantitative requirements can be changed according to the actual situation. Among them, quantization requirements can represent the bit width requirements of weight parameters and activation outputs. Exemplarily, the quantization requirement may be to reduce the size of the network model to be processed by 4 times, and to convert the weight parameter and activation output of the network model to be processed from float32 to int8. The embodiment of the present application does not specifically limit the quantitative requirement. Wherein, the accuracy of the initial network model is much smaller than that of the network model to be processed, and the size of the initial network model is also much smaller than the size of the network model to be processed.

In the embodiment of the present application, the computer device can use the post-training quantization method (Post-Training Quantization, PTQ) or the training perception quantization method (Training-Aware Quantization, TAQ) to perform quantization processing on the weight parameters and activation outputs of the network model to be processed respectively . The embodiment of the present application does not specifically limit the method of separately quantizing the weight parameter and the activation output of the network model to be processed.

In order to better understand the network model quantification method of the embodiment of the present application, the following uses the PTQ method to quantify the weight parameters and activation outputs of the network model to be processed respectively for explanation.

The central idea of using the PTQ quantization method is to calculate the quantization threshold T, and determine the mapping relationship between the weight of the network model to be processed and the weight of the initial network model and the activation output of the network model to be processed and the activation output of the initial network model according to the quantization threshold T mapping relationship.

Take the weight and activation output of the network model to be processed from float32 to int8 as an example, where the mapping relationship between the weight parameters of the network model to be processed and the weight parameters of the initial network model, and the activation output of the network model to be processed and the initial network model The mapping relationship between the activation outputs includes saturated mapping and unsaturated mapping. Generally, when weights are quantized, the unsaturated mapping shown in Figure 2a is used. At this time, the quantization threshold T is equal to the maximum value. When quantizing the activation output, a saturation map is generally used, as shown in Figure 2b. The quantization threshold T in saturated mapping can be searched by relative entropy divergence or mean square error method. The criterion for finding the quantization threshold T is to find such a threshold, based on which the original value is clipped, and the difference from the original value is still the smallest.

In the saturated quantization process, the part exceeding the threshold T needs to be clipped as shown in the second item of formula (1). The so-called clipping is, for example, T=5. If there is 6 in the original value, then it is greater than 5. At this time, 6 Also coerces to 5.

Among them, s is the quantization mapping scale factor, x is the original value, q(x, T) represents the value of x after quantization-inverse quantization, n is the number of bit widths to be quantized, T is the quantization threshold,

For example, x is the original float32 number, and the converted int8 number is q_x; q_x=x/s, n is the bit width number to be quantized, such as 8-bit, 4-bit, 2-bit, 1-bit, etc., when When n=8-bit, s=T/127 is shown in the first item in Formula 1.

Indicates rounding, which can be rounded up or rounded down.

Step 102: Obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model.

Among them, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, which can represent that the performance accuracy of the first calibration network model is higher than the performance accuracy of the initial network model and the bandwidth accuracy of the parameters of the first calibration network model is higher than that of the initial network model. At least one of the bandwidth precision of the parameters.

Specifically, the computer device may use the second target image training set to train the neural network model to obtain the first calibration network model. Wherein, the precision of the first calibration network model is higher than the precision of the initial network model. The first calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the first calibration network model.

As an optional implementation, the computer device may also receive the first calibration network model sent by other devices or the first calibration network model input by the user. The embodiment of this application does not specifically describe the method for the computer device to obtain the first calibration network model. limited.

Further, the computer device may adjust the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.

As an optional implementation manner, the computer device may also compare the output result of the first calibration network model with the output result of the initial network model, adjust the initial weight parameters of the initial network model according to the comparison result, and obtain the first preprocessing model. In step 101, after the full-precision network is converted to a low-precision initial network model, the main reason for the decrease in model performance accuracy generally comes from two parts: the change of weight parameters and the selection of activation thresholds. In the quantization process after training, all weight parameters are usually intercepted by the same approximation method, but the same approximation method may not be suitable for all weight parameters, so this will introduce noise virtually and affect the feature extraction ability of the network model . In this step, by comparing the output results of the first calibration network model with the output results of the initial network model, the first calibration network model is used to correct the initial weight parameters of the initial network model to reduce errors generated in the above process.

Step 103, obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model, and obtain target network model.

Among them, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, which can indicate that the performance accuracy of the second calibration network model is higher than the performance accuracy of the first preprocessing model and the bandwidth accuracy of the parameters of the second calibration network model is high. at least one of the bandwidth accuracy of the parameters of the first preprocessing model.

Specifically, the computer device may use the third target image training set to train the neural network model, and obtain the second calibration network model. Wherein, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model. The second calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the second calibration network model.

As an optional implementation, the computer device can also receive the second calibration network model sent by other devices or receive the second calibration network model input by the user. The embodiment of this application does not specifically describe the method for the computer device to obtain the second calibration network model. limited. The second calibration network model may be the same pre-trained full-precision network model as the network model to be processed, or may be a different pre-trained full-precision network model.

As an implementation manner, the computer device may adjust the initial quantization parameters of the activation output of the first preprocessing model according to the second calibration network model to obtain the target network model. In order to improve the accuracy of the low-precision initial network model, in addition to adjusting the initial weight parameters in step 102, in this step 103, the initial activation threshold is further adjusted, and the computer device can also output the second calibration network model The results are compared with the output results of the first preprocessing model, and the initial quantization parameters of the activation output of the first preprocessing model are adjusted according to the comparison results to obtain the target network model, thereby further reducing the cost of converting the full precision model to a low precision model. loss, which improves the accuracy of the model.

In this embodiment, first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameter and activation output. The initial quantization parameter of the output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model. Since the weight parameters of the network model to be processed and the initial quantization parameters of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameters and the initial quantization parameters of the activation output is much smaller than the network model to be processed, thus ensuring The initial network model can run on some end devices and edge devices. In addition, because the accuracy of the initial network model obtained after quantization processing is low, the initial weight parameters of the initial network model can be adjusted based on the first calibration network model with a higher accuracy than the initial network model to obtain the first preprocessing model , so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model. In addition, the initial quantization parameters of the activation output of the first preprocessing model may be adjusted based on the second calibration network model whose accuracy is higher than that of the first preprocessing model, to obtain the target network model. As a result, not only the size of the target network model is smaller, but also the weight parameters and activation output of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural network models through model compression such as quantization and cropping. , which seriously reduces the accuracy of the deep neural network model.

In an optional embodiment of the present application, the "adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first preprocessing model" in the above step 102 may include the following content:

Based on the learning method of knowledge distillation, the initial weight parameters of the initial network model are adjusted according to the first calibration network model to obtain the first preprocessing model.

Among them, knowledge distillation refers to the idea of model compression, by using a larger trained network step by step to teach a smaller network exactly what to do. "Soft labels" refer to the feature vectors output by the large network after each layer of convolution. Then, the small network is trained to learn the exact behavior of the large network by trying to replicate the output of the large network at each layer (not just the final loss).

Specifically, the computer device can use the knowledge distillation learning method to compare the feature vectors output by each layer of the network in the first calibration network model with the feature vectors output by each layer of the network in the initial network model, and then according to the comparison results, and the first Calibrate the weight parameters corresponding to each layer of network in the network model, and adjust the initial weight parameters in the initial network model.

Compared with the embodiment shown in Figure 1, in this embodiment, since the accuracy of the first calibration network model is higher than that of the initial network model, the learning method based on knowledge distillation uses the first calibration network model as a large teacher The network model guides and learns the small quantized initial network model to obtain better weight parameters, adjusts the initial weight parameters of the initial network model according to the first calibration network model, and obtains the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.

In an optional embodiment of the present application, as shown in FIG. 3, the above-mentioned "learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model" can be Include the following steps:

Step 301, acquire a first training image set.

Among them, the first training image set has hard labels. The hard labels are labels corresponding to each image in the first training image set. Exemplarily, assuming that the role of the initial network model and the first calibration network model is to identify the target object from each image in the first training image set, then the hard label can represent the The target object is labeled.

Specifically, the computer device may receive the first training image set sent by other devices, and may receive the first training image set input by the user. Wherein, the hard labels attached to the first training image set may be marked manually, or may be marked by a computer device based on a neural network model. The embodiment of the present application does not specifically limit the manner of labeling the hard tags of the first training image set. Wherein, the first training image set includes multiple first training images.

Step 302, input the first training image set into the initial network model, and output the first result.

Specifically, the computer device inputs the first training image set into the initial network model, and the initial network model performs feature extraction on the first training image set, and outputs a first result based on the extracted features.

Step 303, input the first training image set into the first calibration network model, and output the second result.

Specifically, the computer device inputs the first training image set into the first calibration network model, the first calibration network model performs feature extraction on the first training image set, and outputs a second result based on the extracted features.

Step 304, based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain a first preprocessing model.

Specifically, the computer device compares the first result output by the initial network model with the hard labels carried by the first training image set, and compares the first result output by the initial network model with the second result output by the first calibration network model. Compared. The computer device adjusts the initial weight parameters of the initial network model according to the comparison result to obtain the first preprocessing model.

Exemplarily, as shown in FIG. 4 . Wherein, the image X can be an image in the first training image set, the teacher network is the first calibration network model, and W_T is the weight parameter of the teacher network. The student network is the initial network model, and W_S is the initial weight parameter of the student network. The image X is input to the teacher network, and the teacher network outputs the second result, namely P_T. The image X is input to the student network, and the student network outputs the first result, namely P_S. The computer device adjusts initial weight parameters of the initial network model based on P_T, P_S and label Y to obtain a first preprocessing model.

Compared with the above-mentioned embodiment, in this embodiment, the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are respectively output, and the first result is used The relationship between the first result and the second result and the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, thereby ensuring the weight The accuracy of the first preprocessed model obtained after parameter tuning is improved.

In an optional embodiment of the present application, as shown in FIG. 5, in the above step 304, "based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model ” may include the following steps:

Step 501, generate a first loss function based on the first result and the hard label.

Specifically, the computer device generates the first loss function based on the first result output by the initial network model and the hard label corresponding to the first training image set. Wherein, the first loss function represents the loss function of the initial network model during the training process. Optionally, the first loss function may be represented by H(Y, P_S), where Y represents the hard label corresponding to the first training image set, and P_S represents the first result output by the initial network model.

Step 502, generating a second loss function based on the first result and the second result.

Specifically, the computer device generates the second loss function based on the first result output by the initial network model and the second result output by the first calibration network model. Wherein, the second loss function represents the initial network model as the loss function of the student network in the process of imitating the first calibration network model. Optionally, the second loss function may be represented by H(P_T, P_S), where P_T represents the second result output by the first calibration network model, and P_S represents the first result output by the initial network model.

Step 503, using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.

Optionally, the computer device may add the first loss function and the second loss function to generate the first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first preprocessing model .

Exemplarily, the first target loss function may be _Lw (x, W_S)=H(Y, P_S)+H(P_T, P_S); wherein, P_T represents the second result output by the first calibration network model, and P_S represents The first result output by the initial network model, Y represents the hard label corresponding to the first training image set, W_S is the initial weight parameter of the initial network model, and X can be an image in the first training image set.

Optionally, the computer device may also multiply the first loss function by the first weight parameter, multiply the second loss function by the second weight parameter, and perform the multiplication of the first loss function and the second loss function after multiplying the corresponding weights. Adding up to obtain the first objective loss function, and adjusting the initial weight parameters of the initial network model based on the first objective loss function to obtain the first preprocessing model.

Exemplarily, the first target loss function may be _Lw (x, W_S)=αH(Y,P_S)+βH(P_T,P_S), where P_T represents the second result output by the first calibration network model, and P_S represents The first result output by the initial network model, Y represents the hard label corresponding to the first training image set, W_S is the initial weight parameter of the initial network model, X can be an image in the first training image set; α is the first weight parameter, β is the second weight parameter. The computer device can adjust the proportion of each loss function in the training process by adjusting the values of α and β. The embodiment of the present application does not specifically limit the values of α and β.

Compared with the embodiment of FIG. 3 , in this embodiment, the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, and based on the first result output by the initial network model and the first A second result output by a calibration network model generates a second loss function. Wherein, the first loss function can be used to represent the gap between the first result and the hard label, and the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result. The initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model. Thus improving the accuracy of the first preprocessing model.

In an optional embodiment of the present application, as shown in FIG. 6, in step 103 above, "adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model" , can include the following steps:

Step 601, based on the learning method of knowledge distillation, the activation quantization threshold of the first preprocessing model is adjusted according to the second calibration network model.

Specifically, the computer device can use a knowledge distillation learning method to compare the feature vectors output by each layer of the network in the second calibration network model with the feature vectors output by each layer of the network in the first preprocessing model, and then according to the comparison results, and The activation quantization threshold corresponding to each network layer in the first calibration network model is adjusted to the activation quantization threshold in the first preprocessing model.

Step 602: Adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain a target network model.

Specifically, after the computer device has adjusted the activation quantization threshold, the computer device can adjust the initial activation output of the first preprocessing model according to the corresponding relationship between the adjusted activation quantization threshold and the initial quantization parameter of the activation output The quantization parameter, according to the quantization parameter of the adjusted activation output, obtains the target network model.

In an optional embodiment of the present application, as shown in FIG. 7 , the "knowledge distillation-based learning method, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model" in step 601 may include The following steps:

Step 701, acquire a second training image set.

Specifically, the computer device may receive the second training image set sent by other devices, and may receive the second training image set input by the user. Wherein, the second training image set may be an unlabeled image or an image with a label, and the implementation of the present application does not specifically limit the second training image set. In addition, the second training image set can be the same as the first training image set, or can be different from the first training image set. Wherein, the second training image set may include multiple second training images.

Step 702, input the second training image set into the first preprocessing model, and output the third result.

Specifically, the computer device inputs the second training image set into the first preprocessing model, and the first preprocessing model performs feature extraction on the second training image set, and outputs a third result based on the extracted features.

Step 703, input the second training image set into the second calibration network model, and output the fourth result.

Specifically, the computer device inputs the second training image set into the second calibration network model, and the second calibration network model performs feature extraction on the second training image set, and outputs a fourth result based on the extracted features.

Step 704: Adjust the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.

Specifically, the computer device compares the third result output by the first preprocessing model with the fourth result output by the second calibration network model. The computer device adjusts the activation quantization threshold of the first preprocessing model according to the comparison result.

Exemplarily, as shown in FIG. 8 , the image X may be an image in the second training image set, the full-precision teacher network is the second calibration network model, and the low-precision student network is the first preprocessing model. The computer device inputs the image X into the all-fine teacher network, and the all-fine teacher network outputs the fourth result, namely P_T in Fig. 8 . The computer device inputs the image X to the low-skilled student network, and the low-skilled student network outputs the third result, namely P_S in FIG. 8 . The computer device adjusts the activation quantization threshold of the first preprocessing model based on P_T, P_S.

In this embodiment, the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing The activation quantization threshold of the processing model is adjusted, so that the accuracy of the adjusted activation quantization threshold can be guaranteed, and the accuracy of the first preprocessing model can be further ensured.

In an optional embodiment of the present application, as shown in FIG. 9 , the "adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result" in the above step 704 may include the following steps:

Step 901, generate a second target loss function based on the third result and the fourth result.

Specifically, the computer device generates the second target loss function based on the third result output by the first preprocessing model and the fourth result output by the second calibration network model. Among them, the smaller the value of the second objective loss function, the more it can indicate that under the same network structure, the first preprocessing model still has a predictive ability similar to that of the second calibration network model after being quantized with the threshold T.

Exemplarily, the second loss function may be L _A (x, T)=H(P_T, P_S), where P_T represents the fourth result output by the second calibration network model, and P_S represents the third result output by the first preprocessing model. As a result, T represents the activation quantization threshold of the first pre-processed model and X represents the images in the second training image set.

Step 902: Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.

Specifically, the computer device adjusts the activation quantization threshold of the first preprocessing model based on the function value calculated by the second target loss function,

Wherein, a symmetrical uniform quantization model is adopted in the embodiment of the present application.

In this embodiment, based on the third result and the fourth result, a second target loss function is generated, and the smaller the value of the second target loss function, the smaller the gap between the third result and the fourth result. Therefore, adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.

Based on the content of the above embodiments, in an optional embodiment of this application, the computer device can also set the initial network model and the first preprocessing network model to be the same model, collectively referred to as the initial network in the embodiments of this application Model. The training process of the initial network model can include the following:

The computer device first adjusts the initial weight parameters of the initial network model according to the first objective loss function, and then adjusts the activation quantization threshold of the initial network model based on the adjusted initial weight parameters according to the second objective loss function. After one adjustment, the initial network model Both the weight parameter and the activation quantization threshold are unsatisfactory. The computer device continues to adjust the initial weight parameter of the initial network model according to the first objective loss function, and then adjusts the initial network model based on the adjusted initial weight parameter and according to the second objective loss function. Activates the quantization threshold. The computer equipment cyclically adjusts the initial weight parameters and activation quantization thresholds of the initial network model. After multiple iterations of training, the training of the initial network model is finally completed and the target network model is generated, thereby ensuring the accuracy of the target network model.

In order to better illustrate the network model quantification method provided in the embodiment of the present application, the embodiment of the present application provides an overall flow of the network model quantification method, as shown in FIG. 10 , the method includes:

Step 1001, obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.

Step 1002, acquire a first training image set.

Step 1003, input the first training image set into the initial network model, and output the first result.

Step 1004, acquire the first calibration network model, input the first training image set into the first calibration network model, and output the second result.

Step 1005, generating a first loss function based on the first result and the hard label.

Step 1006, generating a second loss function based on the first result and the second result.

Step 1007, using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.

Step 1008, acquiring a second training image set.

Step 1009, input the second training image set into the first preprocessing model, and output the third result.

Step 1010, acquire a second calibration network model, input the second training image set into the second calibration network model, and output a fourth result.

Step 1011, generate a second target loss function based on the third result and the fourth result.

Step 1012: Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.

Step 1013, according to the adjusted activation quantization threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.

In an optional embodiment of the present application, the above network model quantification method may be shown in Figure 11, including the following steps:

(1) Parameter initialization of the low-precision network: Based on the pre-trained full-precision student network, the post-training quantization method (PTQ) is used to initialize the student network with low precision, and the low-precision weight values and activations of the student network that need to be quantified are initially determined. Quantize range values.

(2) Under the guidance of the full-precision teacher network 1, learn and adjust the low-precision weight parameters of the student network.

(3) Under the guidance of the full-precision teacher network 2, the low-precision weight parameters of the student network are fixed, and the activation quantization threshold of the student network is learned and adjusted.

(4) Network structure deployment. Based on the quantified network model parameters, deploy the model structure on the actual hardware platform to perform corresponding task processing, such as image classification/detection/recognition tasks, or natural language processing tasks.

It should be understood that although the steps in the flowcharts of FIG. 1 , FIG. 3 , FIGS. 5-7 , and FIGS. 9-10 are shown sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 1, Fig. 3, Fig. 5-7, and Fig. 9-10 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but may be Performed at different times, the execution order of these steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a part of steps or stages in other steps.

Correspondingly, please refer to FIG. 12 , an embodiment of the present application provides a network model quantization device 1200. The network model quantization device 1200 includes: a quantization processing module 1210, a first adjustment module 1220, and a second adjustment module 1230, wherein:

The quantization processing module 1210 is used to obtain the network model to be processed. The network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation outputs of the network model to be processed are respectively quantized according to the quantization requirements to obtain the initial weight parameters and The initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model.

The first adjustment module 1220 is configured to obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first Preprocess the model.

The second adjustment module 1230 is configured to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is performed based on the second calibration network model Adjust to get the target network model.

In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.

In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically used to obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first Result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.

In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically configured to generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and a second loss function to generate a first objective loss function, and adjust the initial weight parameters of the initial network model based on the first objective loss function to obtain a first preprocessing model.

Correspondingly, please refer to FIG. 13 , in one embodiment of the present application, the above-mentioned second adjustment module 1230 includes: a first adjustment unit 1231 and a second adjustment unit 1232, wherein:

The first adjustment unit 1231 is used for a learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;

The second adjustment unit 1232 is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.

In one embodiment of the present application, the above-mentioned first adjustment unit 1231 is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, and output the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.

In one embodiment of the present application, the above-mentioned first adjustment unit 1231 is specifically configured to: generate a second target loss function based on the third result and the fourth result; adjust the first preprocessing model based on the second target loss function The activation quantification threshold for .

For the specific limitations and beneficial effects of the network model quantification device, please refer to the above-mentioned limitations on the network model quantification method, which will not be repeated here. Each module in the above-mentioned apparatus for network model quantification can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure may be as shown in FIG. 14 . The computer device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (Near Field Communication) or other technologies. When the computer program is executed by a processor, a network model quantification method is implemented. The display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer device , and can also be an external keyboard, touchpad, or mouse.

Those skilled in the art can understand that the structure shown in Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 15 . The computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store network model quantification data. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, a network model quantification method is implemented.

Those skilled in the art can understand that the structure shown in Figure 15 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.

In one embodiment of the present application, a computer device is provided, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, the following steps are implemented: acquiring a network model to be processed, the network model to be processed is The pre-trained full-precision network model performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output , construct the initial network model; obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjust the initial weight parameters of the initial network model based on the first calibration network model, and obtain the first preprocessing Model; obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, and adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model.

In one embodiment of the present application, when the processor executes the computer program, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.

In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting the first training image set A result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.

In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first loss function and a second loss function, generate a first target loss function, and adjust initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.

In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; according to the adjusted activation quantization threshold , adjust the initial quantization parameters of the activation output of the first preprocessing model to obtain the target network model.

In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.

In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: generating a second target loss function based on the third result and the fourth result; adjusting the activation of the first preprocessing model based on the second target loss function Quantization Threshold.

In one embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: obtaining a network model to be processed, the network model to be processed is pre-trained According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameters and the initial quantization parameters of the activation output. Based on the initial weight parameters and the initial quantization parameters of the activation output, the initial Network model; obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model; obtain The second calibration network model, the precision of the second calibration network model is higher than the precision of the first preprocessing model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain the target network model.

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting The first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model .

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first A loss function and a second loss function, generating a first target loss function, and adjusting initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: using a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; Threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The image set is input to the second calibration network model, and a fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.

In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: generating a second target loss function based on the third result and the fourth result; adjusting the first preprocessing model based on the second target loss function Activates the quantization threshold.

Those skilled in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be completed by instructing related hardware through computer programs, and the programs can be stored in a computer-readable storage medium. , may include the flow of the embodiments of the above-mentioned methods. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memory.

Although the embodiment of the application has been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the application, and such modifications and variations all fall into the scope defined by the appended claims. within the limited range.

Claims

A network model quantification method, characterized in that the method comprises:

Obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and performing quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, to obtain the initial weight parameter and activation output An initial quantization parameter, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs an initial network model;

Acquiring a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjusting the initial weight parameters of the initial network model based on the first calibration network model, to obtain The first preprocessing model;

Acquiring a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, based on the activation output of the first preprocessing model by the second calibration network model The initial quantization parameters are adjusted to obtain the target network model.
The method according to claim 1, wherein said adjusting said initial weight parameters of said initial network model based on said first calibration network model to obtain a first preprocessing model comprises:

Based on the learning method of knowledge distillation, the initial weight parameter of the initial network model is adjusted according to the first calibration network model to obtain the first preprocessing model.
The method according to claim 2, wherein the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing models, including:

Obtain a first set of training images with hard labels;

input the first training image set to the initial network model, and output a first result;

input the first training image set to the first calibration network model, and output a second result;

Adjusting the initial weight parameters of the initial network model based on the hard label, the first result, and the second result to obtain the first preprocessing model.
The method according to claim 3, wherein the initial weight parameter of the initial network model is adjusted based on the hard label, the first result and the second result to obtain the first A preprocessing model, including:

generating a first loss function based on the first result and the hard label;

generating a second loss function based on the first result and the second result;

Using the first loss function and the second loss function to generate a first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first prediction Handle the model.
The method according to claim 1, wherein the adjustment of the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain a target network model includes:

A learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model;

According to the adjusted activation quantization threshold, an initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network.
The method according to claim 5, wherein the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model, comprises:

Obtain a second training image set;

inputting the second training image set into the first preprocessing model, and outputting a third result;

inputting the second training image set into the second calibration network model, and outputting a fourth result;

The activation quantization threshold of the first preprocessing model is adjusted based on the third result and the fourth result.
The method according to claim 6, wherein the adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result comprises:

generating a second objective loss function based on the third result and the fourth result;

The activation quantization threshold of the first preprocessing model is adjusted based on the second objective loss function.
A network model quantification device, characterized in that the device comprises:

The quantization processing module is used to obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation output of the network model to be processed are respectively quantized according to the quantization requirements to obtain An initial weight parameter and an initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructing an initial network model;

The first adjustment module is used to obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, based on the first calibration network model to the initial network model The initial weight parameters are adjusted to obtain the first preprocessing model;

The second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, based on the second calibration network model to the first preprocessing model The initial quantization parameters of the activation output are adjusted to obtain the target network model.
A computer device, characterized in that it includes a memory and a processor, the memory and the processor are connected in communication with each other, and computer instructions are stored in the memory, and the processor executes the computer instructions, thereby Execute the network model quantification method described in any one of claims 1-7.
A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the network model quantification described in any one of claims 1-7 method.